rafcode 2.3.0 → 2.4.1-0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (129) hide show
  1. package/.claude/settings.local.json +3 -1
  2. package/CLAUDE.md +21 -4
  3. package/RAF/ahvrih-rate-forge/decisions.md +70 -0
  4. package/RAF/ahvrih-rate-forge/input.md +44 -0
  5. package/RAF/ahvrih-rate-forge/outcomes/01-remove-claude-command-config.md +58 -0
  6. package/RAF/ahvrih-rate-forge/outcomes/02-fix-mixed-attempt-cost.md +46 -0
  7. package/RAF/ahvrih-rate-forge/outcomes/03-rate-limit-estimation.md +82 -0
  8. package/RAF/ahvrih-rate-forge/outcomes/04-show-version-in-do-logs.md +45 -0
  9. package/RAF/ahvrih-rate-forge/outcomes/05-sync-main-before-worktree.md +96 -0
  10. package/RAF/ahvrih-rate-forge/outcomes/06-sync-readme-with-codebase.md +45 -0
  11. package/RAF/ahvrih-rate-forge/outcomes/07-no-session-persistence.md +26 -0
  12. package/RAF/ahvrih-rate-forge/outcomes/08-plan-execution-metadata.md +130 -0
  13. package/RAF/ahvrih-rate-forge/plans/01-remove-claude-command-config.md +36 -0
  14. package/RAF/ahvrih-rate-forge/plans/02-fix-mixed-attempt-cost.md +33 -0
  15. package/RAF/ahvrih-rate-forge/plans/03-rate-limit-estimation.md +82 -0
  16. package/RAF/ahvrih-rate-forge/plans/04-show-version-in-do-logs.md +32 -0
  17. package/RAF/ahvrih-rate-forge/plans/05-sync-main-before-worktree.md +40 -0
  18. package/RAF/ahvrih-rate-forge/plans/06-sync-readme-with-codebase.md +61 -0
  19. package/RAF/ahvrih-rate-forge/plans/07-no-session-persistence.md +28 -0
  20. package/RAF/ahvrih-rate-forge/plans/08-plan-execution-metadata.md +123 -0
  21. package/RAF/ahwidh-quick-fix-gremlin/decisions.md +37 -0
  22. package/RAF/ahwidh-quick-fix-gremlin/input.md +35 -0
  23. package/RAF/ahwidh-quick-fix-gremlin/outcomes/01-fix-name-generation-prompt.md +33 -0
  24. package/RAF/ahwidh-quick-fix-gremlin/outcomes/02-fix-amend-commit-scope.md +43 -0
  25. package/RAF/ahwidh-quick-fix-gremlin/outcomes/03-fix-diverged-main-branch-sync.md +32 -0
  26. package/RAF/ahwidh-quick-fix-gremlin/outcomes/04-wire-rate-limit-to-do-command.md +61 -0
  27. package/RAF/ahwidh-quick-fix-gremlin/outcomes/05-add-config-get-set-flags.md +125 -0
  28. package/RAF/ahwidh-quick-fix-gremlin/outcomes/06-sync-worktree-branch-before-execution.md +96 -0
  29. package/RAF/ahwidh-quick-fix-gremlin/outcomes/07-update-frontmatter-format.md +107 -0
  30. package/RAF/ahwidh-quick-fix-gremlin/outcomes/08-remove-plan-token-report.md +76 -0
  31. package/RAF/ahwidh-quick-fix-gremlin/plans/01-fix-name-generation-prompt.md +52 -0
  32. package/RAF/ahwidh-quick-fix-gremlin/plans/02-fix-amend-commit-scope.md +48 -0
  33. package/RAF/ahwidh-quick-fix-gremlin/plans/03-fix-diverged-main-branch-sync.md +49 -0
  34. package/RAF/ahwidh-quick-fix-gremlin/plans/04-wire-rate-limit-to-do-command.md +78 -0
  35. package/RAF/ahwidh-quick-fix-gremlin/plans/05-add-config-get-set-flags.md +101 -0
  36. package/RAF/ahwidh-quick-fix-gremlin/plans/06-sync-worktree-branch-before-execution.md +92 -0
  37. package/RAF/ahwidh-quick-fix-gremlin/plans/07-update-frontmatter-format.md +105 -0
  38. package/RAF/ahwidh-quick-fix-gremlin/plans/08-remove-plan-token-report.md +50 -0
  39. package/README.md +27 -7
  40. package/dist/commands/config.d.ts.map +1 -1
  41. package/dist/commands/config.js +209 -6
  42. package/dist/commands/config.js.map +1 -1
  43. package/dist/commands/do.d.ts.map +1 -1
  44. package/dist/commands/do.js +140 -21
  45. package/dist/commands/do.js.map +1 -1
  46. package/dist/commands/plan.d.ts.map +1 -1
  47. package/dist/commands/plan.js +27 -5
  48. package/dist/commands/plan.js.map +1 -1
  49. package/dist/core/claude-runner.d.ts +0 -6
  50. package/dist/core/claude-runner.d.ts.map +1 -1
  51. package/dist/core/claude-runner.js +4 -9
  52. package/dist/core/claude-runner.js.map +1 -1
  53. package/dist/core/failure-analyzer.d.ts.map +1 -1
  54. package/dist/core/failure-analyzer.js +3 -3
  55. package/dist/core/failure-analyzer.js.map +1 -1
  56. package/dist/core/pull-request.js +3 -3
  57. package/dist/core/pull-request.js.map +1 -1
  58. package/dist/core/state-derivation.d.ts +5 -0
  59. package/dist/core/state-derivation.d.ts.map +1 -1
  60. package/dist/core/state-derivation.js +14 -4
  61. package/dist/core/state-derivation.js.map +1 -1
  62. package/dist/core/worktree.d.ts +44 -0
  63. package/dist/core/worktree.d.ts.map +1 -1
  64. package/dist/core/worktree.js +247 -0
  65. package/dist/core/worktree.js.map +1 -1
  66. package/dist/prompts/amend.d.ts.map +1 -1
  67. package/dist/prompts/amend.js +28 -11
  68. package/dist/prompts/amend.js.map +1 -1
  69. package/dist/prompts/planning.d.ts.map +1 -1
  70. package/dist/prompts/planning.js +28 -11
  71. package/dist/prompts/planning.js.map +1 -1
  72. package/dist/types/config.d.ts +30 -13
  73. package/dist/types/config.d.ts.map +1 -1
  74. package/dist/types/config.js +14 -10
  75. package/dist/types/config.js.map +1 -1
  76. package/dist/utils/config.d.ts +47 -4
  77. package/dist/utils/config.d.ts.map +1 -1
  78. package/dist/utils/config.js +176 -30
  79. package/dist/utils/config.js.map +1 -1
  80. package/dist/utils/frontmatter.d.ts +53 -0
  81. package/dist/utils/frontmatter.d.ts.map +1 -0
  82. package/dist/utils/frontmatter.js +115 -0
  83. package/dist/utils/frontmatter.js.map +1 -0
  84. package/dist/utils/name-generator.d.ts.map +1 -1
  85. package/dist/utils/name-generator.js +9 -19
  86. package/dist/utils/name-generator.js.map +1 -1
  87. package/dist/utils/session-parser.d.ts +44 -0
  88. package/dist/utils/session-parser.d.ts.map +1 -0
  89. package/dist/utils/session-parser.js +122 -0
  90. package/dist/utils/session-parser.js.map +1 -0
  91. package/dist/utils/terminal-symbols.d.ts +22 -3
  92. package/dist/utils/terminal-symbols.d.ts.map +1 -1
  93. package/dist/utils/terminal-symbols.js +52 -18
  94. package/dist/utils/terminal-symbols.js.map +1 -1
  95. package/dist/utils/token-tracker.d.ts +20 -0
  96. package/dist/utils/token-tracker.d.ts.map +1 -1
  97. package/dist/utils/token-tracker.js +57 -2
  98. package/dist/utils/token-tracker.js.map +1 -1
  99. package/package.json +1 -1
  100. package/src/commands/config.ts +242 -7
  101. package/src/commands/do.ts +177 -23
  102. package/src/commands/plan.ts +27 -4
  103. package/src/core/claude-runner.ts +4 -16
  104. package/src/core/failure-analyzer.ts +3 -3
  105. package/src/core/pull-request.ts +3 -3
  106. package/src/core/state-derivation.ts +20 -4
  107. package/src/core/worktree.ts +266 -0
  108. package/src/prompts/amend.ts +28 -11
  109. package/src/prompts/config-docs.md +91 -29
  110. package/src/prompts/planning.ts +28 -11
  111. package/src/types/config.ts +46 -21
  112. package/src/utils/config.ts +200 -33
  113. package/src/utils/frontmatter.ts +140 -0
  114. package/src/utils/name-generator.ts +9 -19
  115. package/src/utils/terminal-symbols.ts +68 -16
  116. package/src/utils/token-tracker.ts +65 -2
  117. package/tests/unit/claude-runner-interactive.test.ts +8 -6
  118. package/tests/unit/claude-runner.test.ts +5 -66
  119. package/tests/unit/commit-planning-artifacts-worktree.test.ts +6 -14
  120. package/tests/unit/commit-planning-artifacts.test.ts +4 -12
  121. package/tests/unit/config-command.test.ts +176 -6
  122. package/tests/unit/config.test.ts +268 -45
  123. package/tests/unit/frontmatter.test.ts +276 -0
  124. package/tests/unit/name-generator.test.ts +1 -1
  125. package/tests/unit/post-execution-picker.test.ts +6 -0
  126. package/tests/unit/terminal-symbols.test.ts +142 -0
  127. package/tests/unit/token-tracker.test.ts +304 -1
  128. package/tests/unit/validation.test.ts +6 -4
  129. package/tests/unit/worktree.test.ts +309 -0
@@ -0,0 +1,130 @@
1
+ # Outcome: Add Per-Task Execution Metadata + Remove Effort Config
2
+
3
+ ## Summary
4
+ Implemented Obsidian-style frontmatter for plan files with required `effort` metadata, introduced `effortMapping` config section, redefined `models.execute` as a ceiling and fallback, removed the legacy `effort.*` config section entirely, and relaxed planning prompt restrictions.
5
+
6
+ ## Key Changes
7
+
8
+ ### Types (`src/types/config.ts`)
9
+ - Removed `EffortConfig`, `EffortScenario`, `EffortLevel` types
10
+ - Removed `VALID_EFFORTS` constant
11
+ - Removed `effort` from `RafConfig` interface
12
+ - Added `TaskEffortLevel` type (`'low' | 'medium' | 'high'`)
13
+ - Added `EffortMappingConfig` type (`{ low: ClaudeModelName; medium: ClaudeModelName; high: ClaudeModelName }`)
14
+ - Added `VALID_TASK_EFFORTS` constant (`['low', 'medium', 'high']`)
15
+ - Updated `DEFAULT_CONFIG` with `effortMapping: { low: 'haiku', medium: 'sonnet', high: 'opus' }`
16
+
17
+ ### Config Utilities (`src/utils/config.ts`)
18
+ - Removed `getEffort()` accessor
19
+ - Removed effort validation logic
20
+ - Added `effortMapping` to `VALID_TOP_LEVEL_KEYS`
21
+ - Added `effortMapping` validation (values must be valid model names)
22
+ - Added `getEffortMapping()` accessor
23
+ - Added `resolveEffortToModel(effort)` function
24
+ - Added `MODEL_TIER_ORDER` constant for tier comparison
25
+ - Added `getModelTier(modelName)` function:
26
+ - Returns numeric tier: haiku=1, sonnet=2, opus=3
27
+ - Extracts family from full model IDs (e.g., `claude-opus-4-6` → opus)
28
+ - Unknown models default to tier 3 (no cap)
29
+ - Added `applyModelCeiling(resolvedModel, ceiling?)` function:
30
+ - Caps resolved model to the ceiling tier
31
+ - Uses `models.execute` as default ceiling
32
+
33
+ ### Frontmatter Parser (`src/utils/frontmatter.ts`) - NEW FILE
34
+ - Parses Obsidian-style frontmatter from plan file content
35
+ - Format: `key: value` lines at top, terminated by `---` (no opening delimiter)
36
+ - Extracts `effort` (required) and `model` (optional) fields
37
+ - Case-insensitive effort values
38
+ - Returns warnings for invalid/unknown keys (doesn't throw)
39
+ - Handles missing delimiter gracefully (returns empty frontmatter)
40
+ - Detects markdown headings before delimiter (invalid frontmatter)
41
+
42
+ ### State Derivation (`src/core/state-derivation.ts`)
43
+ - Added frontmatter parsing alongside dependency parsing
44
+ - Extended `DerivedTask` interface with:
45
+ - `frontmatter?: PlanFrontmatter` - parsed metadata
46
+ - `frontmatterWarnings?: string[]` - parsing warnings
47
+
48
+ ### Do Command (`src/commands/do.ts`)
49
+ - Removed `getEffort()` usage
50
+ - Added `resolveTaskModel()` helper function:
51
+ - Uses explicit `model` frontmatter if present
52
+ - Falls back to `effort` → `effortMapping` resolution
53
+ - Applies ceiling using `applyModelCeiling()`
54
+ - Returns `{ model, source }` for logging
55
+ - Creates new `ClaudeRunner` per task with resolved model
56
+ - Logs missing frontmatter warnings
57
+ - Implements retry escalation: failed tasks retry with ceiling model
58
+
59
+ ### Config Command (`src/commands/config.ts`)
60
+ - Removed `getEffort()` usage and fallback
61
+
62
+ ### Claude Runner (`src/core/claude-runner.ts`)
63
+ - Removed `effortLevel` option from `ClaudeRunnerOptions`
64
+ - Removed `CLAUDE_CODE_EFFORT_LEVEL` env var injection
65
+
66
+ ### Planning Prompts (`src/prompts/planning.ts`, `src/prompts/amend.ts`)
67
+ - Removed restrictive "Plan Output Style" section
68
+ - Removed "NO code snippets or implementation details" restrictions
69
+ - Added frontmatter format requirements with effort assessment guidelines:
70
+ - `low` — trivial/mechanical changes, simple one-file edits
71
+ - `medium` — well-scoped features, bug fixes, multi-file changes
72
+ - `high` — architectural changes, complex logic, deep codebase understanding
73
+
74
+ ### Documentation
75
+ - **`src/prompts/config-docs.md`**:
76
+ - Removed entire `effort` section
77
+ - Added `effortMapping` section with defaults and validation rules
78
+ - Updated `models.execute` description to document ceiling/fallback behavior
79
+ - **`CLAUDE.md`**:
80
+ - Updated "Plan File Structure" to include frontmatter format
81
+ - Documented effort metadata and model resolution
82
+ - Removed effort config references
83
+ - Added ceiling behavior documentation
84
+
85
+ ### Tests
86
+ - **`tests/unit/config.test.ts`**:
87
+ - Removed effort config tests
88
+ - Added `effortMapping` validation tests
89
+ - Added `getModelTier()` tests
90
+ - Added `applyModelCeiling()` tests
91
+ - Added `resolveEffortToModel()` tests
92
+ - **`tests/unit/config-command.test.ts`**:
93
+ - Updated tests to use `effortMapping` instead of `effort`
94
+ - **`tests/unit/frontmatter.test.ts`** - NEW FILE:
95
+ - Comprehensive tests for frontmatter parsing
96
+ - Valid frontmatter tests (effort, model, both)
97
+ - No frontmatter tests (missing delimiter, empty content, markdown heading)
98
+ - Warning tests (unknown keys, invalid values)
99
+ - Edge cases (whitespace, tabs, multiple delimiters)
100
+ - **`tests/unit/claude-runner.test.ts`**:
101
+ - Removed `effortLevel` tests
102
+ - Updated to test environment passing without effort override
103
+ - **`tests/unit/claude-runner-interactive.test.ts`**:
104
+ - Updated default model test to accept both short aliases and full model IDs
105
+ - Updated environment test to not depend on user's env vars
106
+ - **`tests/unit/validation.test.ts`**:
107
+ - Updated default model test to accept config-dependent values
108
+
109
+ ## Acceptance Criteria Verification
110
+ - [x] The entire `effort.*` config section is removed (types, defaults, validation, accessors, env var)
111
+ - [x] `ClaudeRunner` no longer sets `CLAUDE_CODE_EFFORT_LEVEL`
112
+ - [x] Existing config files with `effort` are handled gracefully (rejected as unknown key with warning)
113
+ - [x] `effortMapping` config exists with sensible defaults (low→haiku, medium→sonnet, high→opus)
114
+ - [x] `models.execute` acts as a ceiling — resolved model is capped to this tier
115
+ - [x] Ceiling works correctly: opus plan + sonnet ceiling = sonnet execution
116
+ - [x] Under-ceiling works correctly: haiku plan + sonnet ceiling = haiku execution
117
+ - [x] Retry escalation: failed task retries use the ceiling model
118
+ - [x] Plan files with frontmatter are parsed correctly (effort and optional model extracted)
119
+ - [x] Plan files without frontmatter produce a warning and fall back to config model
120
+ - [x] Effort label in frontmatter correctly maps to a model via `effortMapping`
121
+ - [x] Explicit `model` in frontmatter takes precedence over `effort` mapping but is still subject to ceiling
122
+ - [x] Planning prompts no longer restrict implementation details
123
+ - [x] Planning prompts mandate effort frontmatter with assessment guidelines
124
+ - [x] Invalid frontmatter values produce a warning but don't block execution
125
+ - [x] Frontmatter parsing doesn't break existing plan files (backwards compatible)
126
+ - [x] Tests cover effort removal, effortMapping, ceiling logic, frontmatter parsing, and override logic
127
+ - [x] All 1273 tests pass
128
+ - [x] TypeScript builds successfully
129
+
130
+ <promise>COMPLETE</promise>
@@ -0,0 +1,36 @@
1
+ # Task: Remove `claudeCommand` from Config
2
+
3
+ ## Objective
4
+ Remove the `claudeCommand` configuration key entirely, hardcoding `"claude"` as the CLI binary name.
5
+
6
+ ## Context
7
+ The `claudeCommand` config key allows overriding the Claude CLI binary path. In practice this is unnecessary — Claude CLI is always installed as `claude`. Removing it simplifies the config schema and also resolves the PR #4 review comment: with a broken config file, `getClaudeCommand()` could throw before `raf config` launched its repair session. Hardcoding eliminates that failure path.
8
+
9
+ ## Requirements
10
+ - Remove `claudeCommand` from `RafConfig` interface and `DEFAULT_CONFIG` in `src/types/config.ts`
11
+ - Remove `getClaudeCommand()` accessor from `src/utils/config.ts`
12
+ - Update `getClaudePath()` in `src/core/claude-runner.ts` to hardcode `"claude"` instead of calling `getClaudeCommand()`
13
+ - Remove `claudeCommand` from config validation logic in `src/utils/config.ts`
14
+ - Update `src/prompts/config-docs.md` to remove the `claudeCommand` section
15
+ - Update any tests that reference `claudeCommand`
16
+ - Verify `raf config` works correctly even when `~/.raf/raf.config.json` is malformed (this is the PR #4 fix — with hardcoded command, `getClaudePath` no longer depends on config)
17
+
18
+ ## Implementation Steps
19
+ 1. Remove `claudeCommand` from the TypeScript interface and default config
20
+ 2. Remove the `getClaudeCommand()` helper and update all call sites to use `"claude"` directly
21
+ 3. Update `getClaudePath()` to use hardcoded `"claude"` in the `which` lookup
22
+ 4. Remove `claudeCommand` from config validation (the strict validator should reject it as unknown key if a user still has it — consider adding a migration warning or silently ignoring it)
23
+ 5. Update config-docs.md documentation
24
+ 6. Update/remove affected tests
25
+ 7. Verify the `raf config` fallback path no longer depends on config file state
26
+
27
+ ## Acceptance Criteria
28
+ - [ ] `claudeCommand` key no longer exists in types, defaults, validation, or documentation
29
+ - [ ] `getClaudePath()` works without reading any config
30
+ - [ ] `raf config` launches successfully even with a completely broken config file
31
+ - [ ] All existing tests pass (updated as needed)
32
+ - [ ] Config files containing `claudeCommand` are handled gracefully (warning or silent ignore)
33
+
34
+ ## Notes
35
+ - This also addresses the PR #4 review comment about `raf config` being unusable as a repair path when config is malformed. With the hardcoded command, the entire Claude runner initialization is config-independent.
36
+ - Consider whether to warn users who still have `claudeCommand` in their config or just silently ignore it via validation.
@@ -0,0 +1,33 @@
1
+ # Task: Fix Mixed-Attempt Cost Underreporting
2
+
3
+ ## Objective
4
+ Fix cost calculation to compute cost per-attempt rather than on accumulated usage, preventing underreporting when attempts have mixed aggregate-only and per-model usage data.
5
+
6
+ ## Context
7
+ `TokenTracker.addTask()` currently calls `accumulateUsage(attempts)` to merge all attempts into one `UsageData`, then calls `calculateCost()` on the merged result. The problem: if some attempts have `modelUsage` populated and others only have aggregate fields (which `extractUsageData` allows), the merged result has a non-empty `modelUsage` map. `calculateCost()` then takes the per-model branch and only prices tokens in `modelUsage`, silently dropping aggregate-only tokens from attempts that lacked `modelUsage`. This means mixed-attempt retries underreport cost.
8
+
9
+ ## Requirements
10
+ - Calculate cost independently for each attempt's `UsageData`
11
+ - Each attempt uses per-model pricing if it has `modelUsage`, or aggregate-fallback (Sonnet rates) if it doesn't
12
+ - Sum the per-attempt costs to get the task total
13
+ - The per-attempt cost calculation should also be available for the display formatter (it already receives a `calculateAttemptCost` callback)
14
+ - Preserve the accumulated usage totals for token count display (input/output/cache totals should still be summed across attempts)
15
+
16
+ ## Implementation Steps
17
+ 1. Modify `addTask()` in `TokenTracker` to calculate cost per-attempt, then sum into the task's `CostBreakdown`
18
+ 2. Ensure `calculateCost()` is called on individual attempt `UsageData` objects, not on the accumulated merge
19
+ 3. Update the `CostBreakdown` aggregation to sum per-attempt breakdowns
20
+ 4. Verify that `formatTaskTokenSummary()` still works correctly — it receives per-attempt cost via callback, so the callback should use single-attempt `calculateCost()`
21
+ 5. Add test cases covering the mixed-attempt scenario: one attempt with `modelUsage`, another with only aggregate fields
22
+
23
+ ## Acceptance Criteria
24
+ - [ ] Cost is calculated per-attempt, not on merged usage
25
+ - [ ] Mixed attempts (some with modelUsage, some without) report accurate total cost
26
+ - [ ] Per-attempt display in multi-attempt summaries shows correct individual costs
27
+ - [ ] Grand total cost across all tasks remains accurate
28
+ - [ ] New test cases cover the mixed-attempt edge case
29
+ - [ ] Existing token tracking tests still pass
30
+
31
+ ## Notes
32
+ - The key insight: `accumulateUsage()` is fine for summing token counts for display, but cost calculation must happen before merging to respect the per-model vs. aggregate distinction per attempt.
33
+ - The `formatTaskTokenSummary` already accepts a `calculateAttemptCost` callback — this callback should call `calculateCost` on individual attempt data, which is already the correct granularity.
@@ -0,0 +1,82 @@
1
+ # Task: Add 5h Window Rate Limit Estimation + Plan Session Token Tracking
2
+
3
+ ## Objective
4
+ Add an estimated percentage of the 5-hour rate limit window consumed, displayed after each task and in the grand total summary. Also add token usage tracking and display for `raf plan` interactive sessions.
5
+
6
+ ## Dependencies
7
+ 02
8
+
9
+ ## Context
10
+ Anthropic's subscription plans use a shared credit pool per 5-hour window. The pool is measured in cost-weighted credits, not raw token count. Heavier models (Opus) consume the pool faster than lighter ones (Haiku) in proportion to their API pricing ratios. Users need visibility into how much of their 5-hour window they've consumed during a RAF session.
11
+
12
+ The baseline is 88,000 Sonnet-equivalent tokens per 5h window. All token usage is normalized to Sonnet-equivalent tokens using the API pricing ratios:
13
+ - Haiku input/output costs ~1/3 of Sonnet → 1 Haiku token ≈ 0.33 Sonnet tokens
14
+ - Opus input/output costs ~1.67× of Sonnet → 1 Opus token ≈ 1.67 Sonnet tokens
15
+ - Cache read/create tokens follow the same model-specific pricing ratios
16
+
17
+ ## Requirements
18
+
19
+ ### Rate Limit Estimation (raf do)
20
+ - Convert all token usage to "Sonnet-equivalent tokens" using the configured pricing ratios
21
+ - The conversion formula: `sonnetEquivalentTokens = actualCost / sonnetCostPerToken` (where sonnet cost per token is derived from the configured Sonnet pricing)
22
+ - **Per-attempt model awareness**: task 08 introduces per-task model selection and retry escalation (a task may start with haiku and retry with sonnet/opus). Cost and rate limit calculations must use the actual model that ran each attempt, not a single model for the whole task. This is already handled if cost is calculated per-attempt (task 02), but the rate limit conversion must also use the correct per-attempt pricing
23
+ - Display estimated 5h window percentage after each task (alongside existing token summary)
24
+ - Display cumulative 5h window percentage in the grand total summary
25
+ - New config keys under `display` section:
26
+ - `display.showRateLimitEstimate` (boolean, default: `true`) — toggle showing the % estimate
27
+ - `display.showCacheTokens` (boolean, default: `true`) — toggle showing cache token counts in summaries
28
+ - New config key for the baseline cap:
29
+ - `rateLimitWindow.sonnetTokenCap` (number, default: `88000`) — the Sonnet-equivalent token cap for the 5h window
30
+ - The percentage is a rough estimate — make this clear in the display (e.g., "~42% of 5h window")
31
+
32
+ ### Token Tracking for Plan Sessions (raf plan)
33
+ - After the `raf plan` interactive session ends, display a token usage summary (input/output tokens, cache, estimated cost, 5h window %)
34
+ - Approach: Claude CLI saves session data to `~/.claude/projects/<escaped-path>/<session-id>.jsonl` — each assistant message entry contains usage data (input_tokens, output_tokens, cache tokens, model name)
35
+ - Pass `--session-id <uuid>` to `runInteractive()` so we know exactly which session file to read after the session ends
36
+ - After `runInteractive()` returns, locate and parse the session JSONL file to extract and accumulate all usage data from assistant message entries
37
+ - The session file path is `~/.claude/projects/<escaped-project-path>/<session-id>.jsonl` where the project path is escaped by replacing `/` with `-`
38
+ - Reuse the existing `TokenTracker` and display formatters to show the summary
39
+ - This also applies to `raf plan --amend` sessions
40
+
41
+ ## Implementation Steps
42
+
43
+ ### Rate Limit Estimation
44
+ 1. Add new config types: `display` section with `showRateLimitEstimate` and `showCacheTokens` booleans; `rateLimitWindow` section with `sonnetTokenCap` number
45
+ 2. Add defaults to `DEFAULT_CONFIG`, validation rules, and config accessor helpers
46
+ 3. Update config-docs.md with the new keys
47
+ 4. Implement the Sonnet-equivalent conversion in `TokenTracker` — the simplest approach: use the total estimated cost (already calculated) divided by the Sonnet cost-per-token to get Sonnet-equivalent tokens
48
+ 5. Add a method to `TokenTracker` to compute cumulative 5h window percentage
49
+ 6. Update `formatTaskTokenSummary()` to optionally append the window percentage
50
+ 7. Update `formatTokenTotalSummary()` to optionally show the cumulative window percentage
51
+ 8. Respect the `display.showRateLimitEstimate` and `display.showCacheTokens` config flags in the formatters
52
+
53
+ ### Plan Session Token Tracking
54
+ 9. Modify `runInteractive()` in `claude-runner.ts` to accept an optional `sessionId` parameter and pass it as `--session-id <uuid>` to the Claude CLI spawn
55
+ 10. In `plan.ts` (both plan and amend flows), generate a UUID before calling `runInteractive()` and pass it
56
+ 11. Create a utility to locate and parse the Claude session JSONL file: read `~/.claude/projects/<escaped-path>/<session-id>.jsonl`, extract usage data from all assistant message entries, and accumulate into a `UsageData` structure
57
+ 12. After `runInteractive()` returns in `plan.ts`, call the session parser, feed results to `TokenTracker`, and display the summary using existing formatters
58
+ 13. Handle edge cases: session file not found (Claude CLI may change storage), malformed entries, zero usage
59
+
60
+ ### Tests
61
+ 14. Add tests for the conversion logic, display formatting, and session file parsing
62
+
63
+ ## Acceptance Criteria
64
+ - [ ] After each task, the token summary includes `~X% of 5h window` when enabled
65
+ - [ ] Grand total summary includes cumulative `~X% of 5h window` when enabled
66
+ - [ ] Percentage correctly reflects cost-weighted usage (Opus tasks consume more % than Haiku tasks for same raw token count)
67
+ - [ ] Multi-model tasks (retry escalation) correctly account for different models across attempts in both cost and rate limit calculations
68
+ - [ ] `display.showRateLimitEstimate: false` hides the percentage
69
+ - [ ] `display.showCacheTokens: false` hides cache read/create token counts from summaries
70
+ - [ ] `rateLimitWindow.sonnetTokenCap` correctly adjusts the denominator
71
+ - [ ] Config validation accepts the new keys
72
+ - [ ] Config docs updated with new keys and explanation
73
+ - [ ] After `raf plan` interactive session, a token usage summary is displayed
74
+ - [ ] After `raf plan --amend` interactive session, a token usage summary is displayed
75
+ - [ ] Session file parsing handles missing/malformed files gracefully (warn, don't crash)
76
+ - [ ] Tests cover the conversion math, display toggling, and session file parsing
77
+
78
+ ## Notes
79
+ - The percentage is inherently an estimate — the actual Anthropic rate limit algorithm may differ. The display should communicate this (tilde prefix).
80
+ - The conversion can be simplified by reusing the already-computed dollar cost: `sonnetEquivalentTokens = totalCost / ((sonnetInputPrice + sonnetOutputPrice) / 2M)`. But a more accurate approach would normalize input and output tokens separately since they have different price ratios. Consider which approach is more appropriate.
81
+ - This task depends on task 02 (fix mixed-attempt cost) because accurate cost calculation is the foundation for accurate percentage estimation.
82
+ - Task 08 introduces per-task model selection and retry escalation (cheaper model on first attempt, ceiling model on retry). The per-attempt cost calculation from task 02 already handles different models per attempt via `modelUsage`, but the Sonnet-equivalent conversion for rate limit % must also respect per-attempt model differences. Since conversion is derived from cost (which is already per-attempt), this should work naturally — but verify with a test case covering a multi-model retry scenario.
@@ -0,0 +1,32 @@
1
+ # Task: Show RAF Version and Model in `raf do` Logs
2
+
3
+ ## Objective
4
+ Display the RAF version and execution model in a single combined line at the start of `raf do` execution.
5
+
6
+ ## Context
7
+ Currently `raf do` logs don't prominently show what version of RAF is running or which model is being used for execution. This information is useful for debugging and for users to confirm their setup. The model name should be shown in its full format (e.g., `claude-opus-4-6` rather than just `opus`).
8
+
9
+ ## Requirements
10
+ - Display a single combined line at the start of task execution, before any tasks run
11
+ - Format: `RAF v{version} | Model: {fullModelId}`
12
+ - Version comes from `package.json` via the existing `getVersion()` utility
13
+ - Model should be the full model ID (e.g., `claude-opus-4-6`), not the short alias
14
+ - Use the existing logger formatting (e.g., `logger.info` or appropriate level)
15
+ - Do NOT show effort level — the `effort.*` config is being removed in task 08
16
+
17
+ ## Implementation Steps
18
+ 1. In `src/commands/do.ts`, add a log line at the start of the execution flow (before the first task begins)
19
+ 2. Resolve the full model ID — if the config uses a short alias like `opus`, resolve it to the full model ID
20
+ 3. Format and display the combined line
21
+ 4. Ensure this appears in both worktree and non-worktree execution modes
22
+
23
+ ## Acceptance Criteria
24
+ - [ ] A version/model line appears at the start of every `raf do` execution
25
+ - [ ] Model name is shown in full format (e.g., `claude-opus-4-6`)
26
+ - [ ] Line appears before any task execution output
27
+ - [ ] Works in both worktree and non-worktree modes
28
+
29
+ ## Notes
30
+ - The existing `getVersion()` utility is in `src/utils/version.ts`.
31
+ - Model resolution from short alias to full ID may already exist in the codebase — check how `ClaudeRunner` resolves model names.
32
+ - Keep the display subtle (dim or info level) so it doesn't clutter the output.
@@ -0,0 +1,40 @@
1
+ # Task: Sync Main Branch Before Worktree/PR Operations
2
+
3
+ ## Objective
4
+ Automatically push main to remote before creating a PR and pull main from remote before creating a git worktree, with a configurable toggle.
5
+
6
+ ## Context
7
+ When working with worktrees, the worktree is branched from the current state of the main branch. If main is behind the remote, the worktree starts from stale code. Similarly, before creating a PR, the main branch should be pushed to ensure the remote has the latest state for the PR base. Auto-detecting the main branch (from `origin/HEAD`) avoids hardcoding assumptions about branch naming.
8
+
9
+ ## Requirements
10
+ - Before creating a worktree (`raf plan --worktree` or `raf do --worktree`): pull the main branch from remote to ensure the worktree starts from the latest code
11
+ - Before creating a PR (post-execution "Create PR" action): push the main branch to remote so the PR base is up to date
12
+ - Auto-detect the main branch name from `refs/remotes/origin/HEAD` (the same detection logic used in `pull-request.ts` via `detectBaseBranch()`)
13
+ - New config key: `syncMainBranch` (boolean, default: `true`)
14
+ - When `syncMainBranch` is `false`, skip both push and pull operations
15
+ - Handle failures gracefully: if push/pull fails (e.g., no remote, auth issues), warn but don't block the operation
16
+
17
+ ## Implementation Steps
18
+ 1. Add `syncMainBranch` config key to `RafConfig` interface, `DEFAULT_CONFIG`, validation, and config-docs.md
19
+ 2. Add `getSyncMainBranch()` accessor in `src/utils/config.ts`
20
+ 3. Reuse or extract `detectBaseBranch()` from `src/core/pull-request.ts` into a shared utility (it's already used for PR base detection)
21
+ 4. Add a `syncMainBranch()` utility function that pulls main before worktree creation
22
+ 5. Integrate pull into the worktree creation flow in `src/core/worktree.ts` or the calling code in `do.ts`/`plan` command
23
+ 6. Integrate push into the PR creation flow — before `createPullRequest()` is called, push main
24
+ 7. Add appropriate logging (info level) when syncing occurs
25
+ 8. Handle errors: catch failures, log warning, continue with the operation
26
+ 9. Update config-docs.md
27
+
28
+ ## Acceptance Criteria
29
+ - [ ] Main branch is pulled from remote before worktree creation (when `syncMainBranch: true`)
30
+ - [ ] Main branch is pushed to remote before PR creation (when `syncMainBranch: true`)
31
+ - [ ] Main branch name is auto-detected from `origin/HEAD`
32
+ - [ ] `syncMainBranch: false` skips both operations
33
+ - [ ] Failures in push/pull produce warnings but don't block the workflow
34
+ - [ ] Config validation accepts the new key
35
+ - [ ] Config docs updated
36
+
37
+ ## Notes
38
+ - `detectBaseBranch()` in `src/core/pull-request.ts` already handles the `origin/HEAD` detection with fallback to `main`/`master`. Reuse this logic rather than duplicating it.
39
+ - The pull should only pull the main branch, not the current branch or all branches.
40
+ - Be careful about the pull: if the user has uncommitted changes on main, a pull could fail. Consider using `git fetch origin main && git merge --ff-only origin/main` which fails cleanly if main has diverged.
@@ -0,0 +1,61 @@
1
+ # Task: Sync README with Codebase (Critical Items)
2
+
3
+ ## Objective
4
+ Fix critical discrepancies between README.md and the actual codebase implementation.
5
+
6
+ ## Context
7
+ The README documents features that no longer exist (like `--merge` flag) and is missing documentation for major features (post-execution picker, PR creation). This causes user confusion and makes the tool harder to adopt.
8
+
9
+ ## Dependencies
10
+ 01, 05
11
+
12
+ ## Requirements
13
+ Fix these three critical discrepancies:
14
+
15
+ ### 1. Remove `--merge` flag references
16
+ The `--merge` CLI flag for `raf do` is documented in the README but does not exist in the code. It was replaced by an interactive post-execution action picker. All references to `--merge` must be removed and replaced with the actual behavior.
17
+
18
+ Affected locations in README:
19
+ - Usage examples showing `raf do my-feature -w --merge`
20
+ - Command Reference table listing `--merge` as a flag
21
+ - Any other mentions of `--merge`
22
+
23
+ ### 2. Document the post-execution action picker
24
+ When running `raf do` in worktree mode, an interactive picker appears BEFORE task execution asking what to do after tasks complete. The three options are:
25
+ - **Merge** — merge branch into the original branch (fast-forward preferred, merge-commit fallback)
26
+ - **Create PR** — push branch and create a GitHub PR
27
+ - **Leave branch** — keep the branch as-is, do nothing
28
+
29
+ This is implemented in `src/commands/do.ts` via `pickPostExecutionAction()`. On task failure, the chosen post-action is skipped. After successful post-actions (merge, PR, leave), the worktree directory is cleaned up automatically (the git branch is preserved).
30
+
31
+ ### 3. Document PR creation from worktree
32
+ The "Create PR" post-execution action is a significant feature not mentioned in the README at all. It:
33
+ - Requires `gh` CLI installed and authenticated
34
+ - Auto-detects the base branch from `origin/HEAD`
35
+ - Generates a PR title from the project name
36
+ - Generates a PR body using Claude summarizing input.md, decisions.md, and outcomes
37
+ - Auto-pushes the branch to origin if needed
38
+ - Runs preflight checks; falls back to "leave branch" if `gh` is missing or unauthenticated
39
+
40
+ Also fix the worktree cleanup description — the README currently says worktrees persist and need manual cleanup, but they're actually auto-cleaned after post-actions (only the git branch is preserved). On failure, the worktree IS kept for inspection.
41
+
42
+ ## Implementation Steps
43
+ 1. Read the current README.md thoroughly
44
+ 2. Remove all `--merge` flag references from examples and command reference
45
+ 3. Update the Worktree Mode section to describe the post-execution picker flow
46
+ 4. Add documentation about PR creation capability and its requirements (`gh` CLI)
47
+ 5. Fix the worktree cleanup description to reflect auto-cleanup behavior
48
+ 6. Ensure examples in the README use valid, existing flags only
49
+ 7. Review the updated text for consistency and accuracy
50
+
51
+ ## Acceptance Criteria
52
+ - [ ] No references to `--merge` flag remain in README
53
+ - [ ] Post-execution action picker is documented with all three options
54
+ - [ ] PR creation from worktree is documented including prerequisites
55
+ - [ ] Worktree cleanup behavior is accurately described
56
+ - [ ] All CLI examples use valid, existing flags
57
+ - [ ] README reads naturally and doesn't feel patched
58
+
59
+ ## Notes
60
+ - This task depends on 01 (remove `claudeCommand`) and 05 (sync main branch) because those tasks add/change config keys that should be reflected if mentioned in README.
61
+ - Only fix the critical items listed above. Medium and low priority discrepancies (missing verbose flag in table, blocked symbol, token tracking docs, effort/pricing docs) are out of scope for this task.
@@ -0,0 +1,28 @@
1
+ # Task: Add --no-session-persistence to Throwaway Claude Calls
2
+
3
+ ## Objective
4
+ Prevent PR body generation and failure analysis Claude calls from polluting the user's session history.
5
+
6
+ ## Context
7
+ Claude CLI saves every session to disk by default, making them appear in `claude --resume`. Throwaway utility calls (PR body generation, failure analysis) clutter this history with sessions the user will never want to resume. The name generation utility already solved this by adding `--no-session-persistence` to its `spawn()` call (implemented in the token-reaper project). The same pattern should be applied to the remaining throwaway Claude invocations.
8
+
9
+ ## Requirements
10
+ - Add `--no-session-persistence` flag to the `spawn()` call in `callClaudeForPrBody()` in `src/core/pull-request.ts`
11
+ - Add `--no-session-persistence` flag to the `spawn()` call in the failure analyzer in `src/core/failure-analyzer.ts`
12
+ - Both already use `-p` (print mode), which is required for `--no-session-persistence` to work
13
+ - Follow the exact same pattern used in `src/utils/name-generator.ts`
14
+
15
+ ## Implementation Steps
16
+ 1. In `src/core/pull-request.ts`, add `'--no-session-persistence'` to the args array in `callClaudeForPrBody()`
17
+ 2. In `src/core/failure-analyzer.ts`, add `'--no-session-persistence'` to the args array in the Claude spawn call
18
+ 3. Verify both functions still work correctly — the flag should be transparent to the output
19
+
20
+ ## Acceptance Criteria
21
+ - [ ] PR body generation sessions don't appear in `claude --resume`
22
+ - [ ] Failure analysis sessions don't appear in `claude --resume`
23
+ - [ ] Both features still function correctly (output unchanged)
24
+ - [ ] Pattern matches the existing implementation in `name-generator.ts`
25
+
26
+ ## Notes
27
+ - This is a minimal two-line change (one per file). The flag is well-tested in name-generator.ts already.
28
+ - The `--no-session-persistence` flag only works with `-p` (print mode), which both call sites already use.
@@ -0,0 +1,123 @@
1
+ # Task: Add Per-Task Execution Metadata + Remove Effort Config
2
+
3
+ ## Objective
4
+ Add Obsidian-style frontmatter to plan files with required effort metadata, introduce an effort-to-model mapping config, redefine `models.execute` as a ceiling, remove the legacy `effort.*` config section entirely, and relax planning prompt restrictions.
5
+
6
+ ## Context
7
+ The philosophy is "plan with a smart model, execute with a less smart model when possible." Currently, all tasks execute with the same globally-configured model. By adding frontmatter metadata to plan files during the planning stage, the planner (typically Opus) can assess each task's complexity and recommend the appropriate execution model.
8
+
9
+ The config's role shifts from "pick one model for all tasks" to "set the budget ceiling." The planner recommends effort per task, which maps to a model via `effortMapping`. The final model is capped by `models.execute` (the ceiling). This gives users budget control while letting the planner differentiate tasks.
10
+
11
+ The existing `effort.*` config section (which maps to Claude CLI's `--effort` flag via `CLAUDE_CODE_EFFORT_LEVEL` env var) should be removed entirely — it's a separate concept from the task complexity "effort" label in plan frontmatter.
12
+
13
+ Additionally, the planning prompts currently contain restrictive wording that discourages implementation details in plans. This wording should be removed to allow the planning model to include whatever level of detail it deems appropriate.
14
+
15
+ ## Dependencies
16
+ 04
17
+
18
+ ## Requirements
19
+
20
+ ### Frontmatter Metadata in Plan Files
21
+ - Plan files MUST have Obsidian-style properties at the top, before the `# Task:` heading
22
+ - Format uses only a closing `---` delimiter (no opening delimiter):
23
+ ```
24
+ effort: medium
25
+ ---
26
+ # Task: ...
27
+ ```
28
+ - `effort` is REQUIRED — a human-readable task complexity label (low/medium/high) that maps to a model. NOT Claude's `--effort` flag
29
+ - `model` is OPTIONAL — an explicit model override (short alias or full model ID) that bypasses the effort mapping entirely
30
+ - If both `model` and `effort` are present, `model` takes precedence over the effort mapping
31
+ - If frontmatter is missing (e.g., manually created plans), warn and fall back to the config default model
32
+
33
+ ### Effort-to-Model Mapping
34
+ - New config section: `effortMapping` that maps complexity labels to model names
35
+ - Default mapping: `{ low: "haiku", medium: "sonnet", high: "opus" }`
36
+ - When a plan has `effort: medium`, RAF resolves it to the model from the mapping (e.g., sonnet)
37
+ - The mapping values follow the same validation as model names (short aliases or full model IDs)
38
+ - Add to `DEFAULT_CONFIG`, validation, config-docs.md
39
+
40
+ ### Config as Ceiling + Fallback
41
+ - `models.execute` serves dual purpose:
42
+ 1. **Ceiling**: the maximum model tier allowed for task execution
43
+ 2. **Fallback**: the model used when a plan has no effort frontmatter (e.g., manually created plans, legacy plans)
44
+ - Model tier ordering: haiku < sonnet < opus (based on pricing — cheaper = lower tier)
45
+ - When frontmatter IS present: final model = `min(resolved_model, models.execute)` where "min" means the cheaper/lower-tier model
46
+ - When frontmatter is MISSING: final model = `models.execute` directly (with a warning about missing frontmatter)
47
+ - Example: if `models.execute: "sonnet"` (ceiling) and plan says `effort: high` (maps to opus), the task runs with sonnet (capped)
48
+ - Example: if `models.execute: "sonnet"` and plan says `effort: low` (maps to haiku), the task runs with haiku (under ceiling, no cap)
49
+ - Example: if plan has no frontmatter, the task runs with sonnet (fallback)
50
+ - The explicit `model` field in frontmatter is ALSO subject to the ceiling — no way to exceed the config ceiling from a plan file
51
+ - **Retry escalation**: when a task fails and is retried, bump the model to `models.execute` (the ceiling) for the retry attempt. If the first attempt already used the ceiling model, retry with the same model. This gives failing tasks the best available model on subsequent attempts
52
+ - Implement a `getModelTier()` utility that returns a numeric tier for comparison (using pricing ratios or a simple ordered list)
53
+
54
+ ### Remove Legacy Effort Config
55
+ - Remove the entire `effort.*` config section from `RafConfig` interface in `src/types/config.ts`
56
+ - Remove `EffortConfig`, `EffortScenario`, `EffortLevel` types, `VALID_EFFORTS` constant
57
+ - Remove `getEffort()` accessor from `src/utils/config.ts`
58
+ - Remove effort validation logic from config validation
59
+ - Remove `effortLevel` option from `ClaudeRunner` run options in `src/core/claude-runner.ts`
60
+ - Remove `CLAUDE_CODE_EFFORT_LEVEL` env var injection from `ClaudeRunner`
61
+ - Remove all `getEffort()` call sites:
62
+ - `src/commands/do.ts` — `getEffort('execute')` passed to runner options
63
+ - `src/commands/config.ts` — `getEffort('config')` and its fallback
64
+ - Any other call sites
65
+ - Remove effort from `src/prompts/config-docs.md`
66
+ - Update tests that reference effort config
67
+
68
+ ### Planning Prompt Changes
69
+ - Remove the "Plan Output Style" section that says "CRITICAL: Plans should be HIGH-LEVEL and CONCEPTUAL" from both `src/prompts/planning.ts` and `src/prompts/amend.ts`
70
+ - Remove the restrictive bullet points: "Describe WHAT needs to be done, not HOW to code it", "NO code snippets or implementation details in plans"
71
+ - Replace with neutral guidance: plans can include whatever level of detail the planner deems helpful
72
+ - Add instructions that the planner MUST include effort frontmatter on every task, with guidance on how to assess complexity:
73
+ - `low` — trivial/mechanical changes, simple one-file edits, config changes
74
+ - `medium` — well-scoped feature work, bug fixes with clear plans, multi-file changes following existing patterns
75
+ - `high` — architectural changes, complex logic, tasks requiring deep codebase understanding
76
+ - Document the frontmatter format (Obsidian-style, closing `---` only) in the prompt
77
+
78
+ ## Implementation Steps
79
+ 1. Remove the legacy effort config: delete `EffortConfig`, `EffortScenario`, `EffortLevel` types, `VALID_EFFORTS`, `getEffort()`, the `effort` key from `RafConfig` and `DEFAULT_CONFIG`, effort validation, `effortLevel` from `ClaudeRunner` run options, `CLAUDE_CODE_EFFORT_LEVEL` env var logic
80
+ 2. Remove all `getEffort()` call sites in `do.ts`, `config.ts`, and anywhere else
81
+ 3. Add new `effortMapping` config section: type definition, defaults (`{ low: "haiku", medium: "sonnet", high: "opus" }`), validation (values must be valid model names), accessor helper
82
+ 4. Implement model tier comparison: a `getModelTier()` utility that returns a numeric rank based on the model family (haiku=1, sonnet=2, opus=3). For full model IDs, extract the family name. For unknown models, default to highest tier (no cap)
83
+ 5. Redefine `models.execute` semantics: it now acts as ceiling + fallback. Update its documentation to reflect this
84
+ 6. Create a frontmatter parser utility that extracts `model` and `effort` from plan file content (parse `key: value` lines before the closing `---` delimiter)
85
+ 7. Integrate frontmatter parsing into `state-derivation.ts` alongside the existing `parseDependencies()` — store parsed metadata on the task state object
86
+ 8. In `do.ts`, before each task execution, resolve the per-task model:
87
+ - Read frontmatter `model` or resolve `effort` via `effortMapping`
88
+ - Apply ceiling: `min(resolved_model, models.execute)` using tier comparison
89
+ - Fall back to `models.execute` if no frontmatter (fallback role)
90
+ 9. Implement retry escalation: when retrying a failed task, use `models.execute` (ceiling) instead of the original resolved model. The retry logic in `do.ts` should detect attempt > 1 and escalate
91
+ 10. Modify `ClaudeRunner` or the task execution loop to support per-task model (consider creating a new runner instance per task if the model differs)
92
+ 10. Update `src/prompts/planning.ts` — remove the restrictive "Plan Output Style" section, replace with neutral guidance, add required frontmatter format instructions with effort assessment criteria
93
+ 11. Update `src/prompts/amend.ts` — same prompt changes
94
+ 12. Update config-docs.md: remove effort section, add effortMapping section, update models.execute description to "ceiling"
95
+ 13. Update CLAUDE.md: update "Plan File Structure" to include frontmatter, remove effort references, document ceiling behavior
96
+ 14. Update/remove affected tests
97
+
98
+ ## Acceptance Criteria
99
+ - [ ] The entire `effort.*` config section is removed (types, defaults, validation, accessors, env var)
100
+ - [ ] `ClaudeRunner` no longer sets `CLAUDE_CODE_EFFORT_LEVEL`
101
+ - [ ] Existing config files with `effort` are handled gracefully (warning or silent ignore)
102
+ - [ ] `effortMapping` config exists with sensible defaults (low→haiku, medium→sonnet, high→opus)
103
+ - [ ] `models.execute` acts as a ceiling — resolved model is capped to this tier
104
+ - [ ] Ceiling works correctly: opus plan + sonnet ceiling = sonnet execution
105
+ - [ ] Under-ceiling works correctly: haiku plan + sonnet ceiling = haiku execution
106
+ - [ ] Retry escalation: failed task retries use the ceiling model
107
+ - [ ] Plan files with frontmatter are parsed correctly (effort and optional model extracted)
108
+ - [ ] Plan files without frontmatter produce a warning and fall back to config model
109
+ - [ ] Effort label in frontmatter correctly maps to a model via `effortMapping`
110
+ - [ ] Explicit `model` in frontmatter takes precedence over `effort` mapping but is still subject to ceiling
111
+ - [ ] Planning prompts no longer restrict implementation details
112
+ - [ ] Planning prompts mandate effort frontmatter with assessment guidelines
113
+ - [ ] Invalid frontmatter values produce a warning but don't block execution
114
+ - [ ] Frontmatter parsing doesn't break existing plan files (backwards compatible)
115
+ - [ ] Tests cover effort removal, effortMapping, ceiling logic, frontmatter parsing, and override logic
116
+
117
+ ## Notes
118
+ - The frontmatter parser should be lenient: ignore unknown keys, handle missing `---` delimiter gracefully, treat malformed properties as "no frontmatter" rather than erroring. The format is Obsidian-style: `key: value` lines at the top of the file, terminated by a `---` line (no opening delimiter).
119
+ - This task depends on task 04 (version/model display) since the per-task model override should be visible in the execution log line.
120
+ - The `parseDependencies()` function in `state-derivation.ts` already reads plan file content — the frontmatter parser can be called at the same point, avoiding a second file read.
121
+ - Removing effort config is a breaking change for users who have `effort.*` in their config file. The config validator should handle this gracefully (warn about unknown key, don't crash).
122
+ - Model tier comparison for full model IDs: extract the family name (e.g., `claude-opus-4-6` → `opus`) and use the same tier ordering. Unknown families should default to the highest tier so they're never accidentally capped.
123
+ - The ceiling concept also applies to the explicit `model` frontmatter field — a plan cannot exceed the user's configured budget. This is intentional: the user always has final say on cost.
@@ -0,0 +1,37 @@
1
+ # Project Decisions
2
+
3
+ ## For `raf config --get/--set`: Should `--get` with no key show the full merged config (defaults + overrides), or only user overrides from the config file?
4
+ Full merged config — Shows the complete resolved config with all defaults filled in, so users see every active setting.
5
+
6
+ ## For `raf config --set`: Should setting a value to its default automatically remove it from the config file (keeping config minimal), or always write explicitly?
7
+ Remove if default — If the value matches the default, remove the key from the config file to keep it clean.
8
+
9
+ ## For the diverged main branch fix: Should callers treat this as a hard error or a warning?
10
+ Warning only — Show a visible warning (yellow) but continue execution. Stale base is better than blocking work.
11
+
12
+ ## For the name generation fix: Should the fix tighten the prompt, add response parsing, or both?
13
+ Prompt only — Just improve the prompt instructions to be more explicit about output format.
14
+
15
+ ## For `raf config --get models.plan`: Should the output be plain value only or formatted?
16
+ Plain value — Just print 'opus'. Easy to pipe into other commands.
17
+
18
+ ## For the token investigation task: Research only or plan fix now?
19
+ Research now and create task if needed — investigated and found it's a display issue: `input_tokens` from Claude API only reports non-cached tokens. Fix plan created as task 07 to show total input in display.
20
+
21
+ ## Token investigation findings
22
+ Root cause: Claude API separates `input_tokens` (non-cached, e.g. 3-22) from `cache_read_input_tokens` (e.g. 18,000+) and `cache_creation_input_tokens` (e.g. 6,000+). RAF displays only non-cached as "X in". Fix: sum all three for the display. Cost calculation is already correct.
23
+
24
+ ## For amend commit fix: Should plan files be removed from the commit or committed separately?
25
+ Remove from commit — Only commit input.md and decisions.md during planning. Plan files get committed by Claude during execution.
26
+
27
+ ## For syncing worktree branch with main before execution: Rebase or merge?
28
+ Rebase onto main — Cleaner history, replays worktree commits on top of latest main.
29
+
30
+ ## If the rebase has conflicts, should `raf do` abort or skip sync?
31
+ Skip sync and warn — Skip the sync step, show a warning, and continue execution on the current branch state.
32
+
33
+ ## For removing token report from plan: Should `session-parser.ts` and its tests be deleted or kept?
34
+ Delete session-parser entirely — Remove session-parser.ts and its test file since they have no other consumers.
35
+
36
+ ## Should the `sessionId` parameter be removed from `runInteractive()` in claude-runner.ts?
37
+ Remove sessionId from runInteractive() too — Clean up the runner API as well since no other callers pass sessionId.