rafcode 2.3.0 → 2.4.1-0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/settings.local.json +3 -1
- package/CLAUDE.md +21 -4
- package/RAF/ahvrih-rate-forge/decisions.md +70 -0
- package/RAF/ahvrih-rate-forge/input.md +44 -0
- package/RAF/ahvrih-rate-forge/outcomes/01-remove-claude-command-config.md +58 -0
- package/RAF/ahvrih-rate-forge/outcomes/02-fix-mixed-attempt-cost.md +46 -0
- package/RAF/ahvrih-rate-forge/outcomes/03-rate-limit-estimation.md +82 -0
- package/RAF/ahvrih-rate-forge/outcomes/04-show-version-in-do-logs.md +45 -0
- package/RAF/ahvrih-rate-forge/outcomes/05-sync-main-before-worktree.md +96 -0
- package/RAF/ahvrih-rate-forge/outcomes/06-sync-readme-with-codebase.md +45 -0
- package/RAF/ahvrih-rate-forge/outcomes/07-no-session-persistence.md +26 -0
- package/RAF/ahvrih-rate-forge/outcomes/08-plan-execution-metadata.md +130 -0
- package/RAF/ahvrih-rate-forge/plans/01-remove-claude-command-config.md +36 -0
- package/RAF/ahvrih-rate-forge/plans/02-fix-mixed-attempt-cost.md +33 -0
- package/RAF/ahvrih-rate-forge/plans/03-rate-limit-estimation.md +82 -0
- package/RAF/ahvrih-rate-forge/plans/04-show-version-in-do-logs.md +32 -0
- package/RAF/ahvrih-rate-forge/plans/05-sync-main-before-worktree.md +40 -0
- package/RAF/ahvrih-rate-forge/plans/06-sync-readme-with-codebase.md +61 -0
- package/RAF/ahvrih-rate-forge/plans/07-no-session-persistence.md +28 -0
- package/RAF/ahvrih-rate-forge/plans/08-plan-execution-metadata.md +123 -0
- package/RAF/ahwidh-quick-fix-gremlin/decisions.md +37 -0
- package/RAF/ahwidh-quick-fix-gremlin/input.md +35 -0
- package/RAF/ahwidh-quick-fix-gremlin/outcomes/01-fix-name-generation-prompt.md +33 -0
- package/RAF/ahwidh-quick-fix-gremlin/outcomes/02-fix-amend-commit-scope.md +43 -0
- package/RAF/ahwidh-quick-fix-gremlin/outcomes/03-fix-diverged-main-branch-sync.md +32 -0
- package/RAF/ahwidh-quick-fix-gremlin/outcomes/04-wire-rate-limit-to-do-command.md +61 -0
- package/RAF/ahwidh-quick-fix-gremlin/outcomes/05-add-config-get-set-flags.md +125 -0
- package/RAF/ahwidh-quick-fix-gremlin/outcomes/06-sync-worktree-branch-before-execution.md +96 -0
- package/RAF/ahwidh-quick-fix-gremlin/outcomes/07-update-frontmatter-format.md +107 -0
- package/RAF/ahwidh-quick-fix-gremlin/outcomes/08-remove-plan-token-report.md +76 -0
- package/RAF/ahwidh-quick-fix-gremlin/plans/01-fix-name-generation-prompt.md +52 -0
- package/RAF/ahwidh-quick-fix-gremlin/plans/02-fix-amend-commit-scope.md +48 -0
- package/RAF/ahwidh-quick-fix-gremlin/plans/03-fix-diverged-main-branch-sync.md +49 -0
- package/RAF/ahwidh-quick-fix-gremlin/plans/04-wire-rate-limit-to-do-command.md +78 -0
- package/RAF/ahwidh-quick-fix-gremlin/plans/05-add-config-get-set-flags.md +101 -0
- package/RAF/ahwidh-quick-fix-gremlin/plans/06-sync-worktree-branch-before-execution.md +92 -0
- package/RAF/ahwidh-quick-fix-gremlin/plans/07-update-frontmatter-format.md +105 -0
- package/RAF/ahwidh-quick-fix-gremlin/plans/08-remove-plan-token-report.md +50 -0
- package/README.md +27 -7
- package/dist/commands/config.d.ts.map +1 -1
- package/dist/commands/config.js +209 -6
- package/dist/commands/config.js.map +1 -1
- package/dist/commands/do.d.ts.map +1 -1
- package/dist/commands/do.js +140 -21
- package/dist/commands/do.js.map +1 -1
- package/dist/commands/plan.d.ts.map +1 -1
- package/dist/commands/plan.js +27 -5
- package/dist/commands/plan.js.map +1 -1
- package/dist/core/claude-runner.d.ts +0 -6
- package/dist/core/claude-runner.d.ts.map +1 -1
- package/dist/core/claude-runner.js +4 -9
- package/dist/core/claude-runner.js.map +1 -1
- package/dist/core/failure-analyzer.d.ts.map +1 -1
- package/dist/core/failure-analyzer.js +3 -3
- package/dist/core/failure-analyzer.js.map +1 -1
- package/dist/core/pull-request.js +3 -3
- package/dist/core/pull-request.js.map +1 -1
- package/dist/core/state-derivation.d.ts +5 -0
- package/dist/core/state-derivation.d.ts.map +1 -1
- package/dist/core/state-derivation.js +14 -4
- package/dist/core/state-derivation.js.map +1 -1
- package/dist/core/worktree.d.ts +44 -0
- package/dist/core/worktree.d.ts.map +1 -1
- package/dist/core/worktree.js +247 -0
- package/dist/core/worktree.js.map +1 -1
- package/dist/prompts/amend.d.ts.map +1 -1
- package/dist/prompts/amend.js +28 -11
- package/dist/prompts/amend.js.map +1 -1
- package/dist/prompts/planning.d.ts.map +1 -1
- package/dist/prompts/planning.js +28 -11
- package/dist/prompts/planning.js.map +1 -1
- package/dist/types/config.d.ts +30 -13
- package/dist/types/config.d.ts.map +1 -1
- package/dist/types/config.js +14 -10
- package/dist/types/config.js.map +1 -1
- package/dist/utils/config.d.ts +47 -4
- package/dist/utils/config.d.ts.map +1 -1
- package/dist/utils/config.js +176 -30
- package/dist/utils/config.js.map +1 -1
- package/dist/utils/frontmatter.d.ts +53 -0
- package/dist/utils/frontmatter.d.ts.map +1 -0
- package/dist/utils/frontmatter.js +115 -0
- package/dist/utils/frontmatter.js.map +1 -0
- package/dist/utils/name-generator.d.ts.map +1 -1
- package/dist/utils/name-generator.js +9 -19
- package/dist/utils/name-generator.js.map +1 -1
- package/dist/utils/session-parser.d.ts +44 -0
- package/dist/utils/session-parser.d.ts.map +1 -0
- package/dist/utils/session-parser.js +122 -0
- package/dist/utils/session-parser.js.map +1 -0
- package/dist/utils/terminal-symbols.d.ts +22 -3
- package/dist/utils/terminal-symbols.d.ts.map +1 -1
- package/dist/utils/terminal-symbols.js +52 -18
- package/dist/utils/terminal-symbols.js.map +1 -1
- package/dist/utils/token-tracker.d.ts +20 -0
- package/dist/utils/token-tracker.d.ts.map +1 -1
- package/dist/utils/token-tracker.js +57 -2
- package/dist/utils/token-tracker.js.map +1 -1
- package/package.json +1 -1
- package/src/commands/config.ts +242 -7
- package/src/commands/do.ts +177 -23
- package/src/commands/plan.ts +27 -4
- package/src/core/claude-runner.ts +4 -16
- package/src/core/failure-analyzer.ts +3 -3
- package/src/core/pull-request.ts +3 -3
- package/src/core/state-derivation.ts +20 -4
- package/src/core/worktree.ts +266 -0
- package/src/prompts/amend.ts +28 -11
- package/src/prompts/config-docs.md +91 -29
- package/src/prompts/planning.ts +28 -11
- package/src/types/config.ts +46 -21
- package/src/utils/config.ts +200 -33
- package/src/utils/frontmatter.ts +140 -0
- package/src/utils/name-generator.ts +9 -19
- package/src/utils/terminal-symbols.ts +68 -16
- package/src/utils/token-tracker.ts +65 -2
- package/tests/unit/claude-runner-interactive.test.ts +8 -6
- package/tests/unit/claude-runner.test.ts +5 -66
- package/tests/unit/commit-planning-artifacts-worktree.test.ts +6 -14
- package/tests/unit/commit-planning-artifacts.test.ts +4 -12
- package/tests/unit/config-command.test.ts +176 -6
- package/tests/unit/config.test.ts +268 -45
- package/tests/unit/frontmatter.test.ts +276 -0
- package/tests/unit/name-generator.test.ts +1 -1
- package/tests/unit/post-execution-picker.test.ts +6 -0
- package/tests/unit/terminal-symbols.test.ts +142 -0
- package/tests/unit/token-tracker.test.ts +304 -1
- package/tests/unit/validation.test.ts +6 -4
- package/tests/unit/worktree.test.ts +309 -0
|
@@ -0,0 +1,130 @@
|
|
|
1
|
+
# Outcome: Add Per-Task Execution Metadata + Remove Effort Config
|
|
2
|
+
|
|
3
|
+
## Summary
|
|
4
|
+
Implemented Obsidian-style frontmatter for plan files with required `effort` metadata, introduced `effortMapping` config section, redefined `models.execute` as a ceiling and fallback, removed the legacy `effort.*` config section entirely, and relaxed planning prompt restrictions.
|
|
5
|
+
|
|
6
|
+
## Key Changes
|
|
7
|
+
|
|
8
|
+
### Types (`src/types/config.ts`)
|
|
9
|
+
- Removed `EffortConfig`, `EffortScenario`, `EffortLevel` types
|
|
10
|
+
- Removed `VALID_EFFORTS` constant
|
|
11
|
+
- Removed `effort` from `RafConfig` interface
|
|
12
|
+
- Added `TaskEffortLevel` type (`'low' | 'medium' | 'high'`)
|
|
13
|
+
- Added `EffortMappingConfig` type (`{ low: ClaudeModelName; medium: ClaudeModelName; high: ClaudeModelName }`)
|
|
14
|
+
- Added `VALID_TASK_EFFORTS` constant (`['low', 'medium', 'high']`)
|
|
15
|
+
- Updated `DEFAULT_CONFIG` with `effortMapping: { low: 'haiku', medium: 'sonnet', high: 'opus' }`
|
|
16
|
+
|
|
17
|
+
### Config Utilities (`src/utils/config.ts`)
|
|
18
|
+
- Removed `getEffort()` accessor
|
|
19
|
+
- Removed effort validation logic
|
|
20
|
+
- Added `effortMapping` to `VALID_TOP_LEVEL_KEYS`
|
|
21
|
+
- Added `effortMapping` validation (values must be valid model names)
|
|
22
|
+
- Added `getEffortMapping()` accessor
|
|
23
|
+
- Added `resolveEffortToModel(effort)` function
|
|
24
|
+
- Added `MODEL_TIER_ORDER` constant for tier comparison
|
|
25
|
+
- Added `getModelTier(modelName)` function:
|
|
26
|
+
- Returns numeric tier: haiku=1, sonnet=2, opus=3
|
|
27
|
+
- Extracts family from full model IDs (e.g., `claude-opus-4-6` → opus)
|
|
28
|
+
- Unknown models default to tier 3 (no cap)
|
|
29
|
+
- Added `applyModelCeiling(resolvedModel, ceiling?)` function:
|
|
30
|
+
- Caps resolved model to the ceiling tier
|
|
31
|
+
- Uses `models.execute` as default ceiling
|
|
32
|
+
|
|
33
|
+
### Frontmatter Parser (`src/utils/frontmatter.ts`) - NEW FILE
|
|
34
|
+
- Parses Obsidian-style frontmatter from plan file content
|
|
35
|
+
- Format: `key: value` lines at top, terminated by `---` (no opening delimiter)
|
|
36
|
+
- Extracts `effort` (required) and `model` (optional) fields
|
|
37
|
+
- Case-insensitive effort values
|
|
38
|
+
- Returns warnings for invalid/unknown keys (doesn't throw)
|
|
39
|
+
- Handles missing delimiter gracefully (returns empty frontmatter)
|
|
40
|
+
- Detects markdown headings before delimiter (invalid frontmatter)
|
|
41
|
+
|
|
42
|
+
### State Derivation (`src/core/state-derivation.ts`)
|
|
43
|
+
- Added frontmatter parsing alongside dependency parsing
|
|
44
|
+
- Extended `DerivedTask` interface with:
|
|
45
|
+
- `frontmatter?: PlanFrontmatter` - parsed metadata
|
|
46
|
+
- `frontmatterWarnings?: string[]` - parsing warnings
|
|
47
|
+
|
|
48
|
+
### Do Command (`src/commands/do.ts`)
|
|
49
|
+
- Removed `getEffort()` usage
|
|
50
|
+
- Added `resolveTaskModel()` helper function:
|
|
51
|
+
- Uses explicit `model` frontmatter if present
|
|
52
|
+
- Falls back to `effort` → `effortMapping` resolution
|
|
53
|
+
- Applies ceiling using `applyModelCeiling()`
|
|
54
|
+
- Returns `{ model, source }` for logging
|
|
55
|
+
- Creates new `ClaudeRunner` per task with resolved model
|
|
56
|
+
- Logs missing frontmatter warnings
|
|
57
|
+
- Implements retry escalation: failed tasks retry with ceiling model
|
|
58
|
+
|
|
59
|
+
### Config Command (`src/commands/config.ts`)
|
|
60
|
+
- Removed `getEffort()` usage and fallback
|
|
61
|
+
|
|
62
|
+
### Claude Runner (`src/core/claude-runner.ts`)
|
|
63
|
+
- Removed `effortLevel` option from `ClaudeRunnerOptions`
|
|
64
|
+
- Removed `CLAUDE_CODE_EFFORT_LEVEL` env var injection
|
|
65
|
+
|
|
66
|
+
### Planning Prompts (`src/prompts/planning.ts`, `src/prompts/amend.ts`)
|
|
67
|
+
- Removed restrictive "Plan Output Style" section
|
|
68
|
+
- Removed "NO code snippets or implementation details" restrictions
|
|
69
|
+
- Added frontmatter format requirements with effort assessment guidelines:
|
|
70
|
+
- `low` — trivial/mechanical changes, simple one-file edits
|
|
71
|
+
- `medium` — well-scoped features, bug fixes, multi-file changes
|
|
72
|
+
- `high` — architectural changes, complex logic, deep codebase understanding
|
|
73
|
+
|
|
74
|
+
### Documentation
|
|
75
|
+
- **`src/prompts/config-docs.md`**:
|
|
76
|
+
- Removed entire `effort` section
|
|
77
|
+
- Added `effortMapping` section with defaults and validation rules
|
|
78
|
+
- Updated `models.execute` description to document ceiling/fallback behavior
|
|
79
|
+
- **`CLAUDE.md`**:
|
|
80
|
+
- Updated "Plan File Structure" to include frontmatter format
|
|
81
|
+
- Documented effort metadata and model resolution
|
|
82
|
+
- Removed effort config references
|
|
83
|
+
- Added ceiling behavior documentation
|
|
84
|
+
|
|
85
|
+
### Tests
|
|
86
|
+
- **`tests/unit/config.test.ts`**:
|
|
87
|
+
- Removed effort config tests
|
|
88
|
+
- Added `effortMapping` validation tests
|
|
89
|
+
- Added `getModelTier()` tests
|
|
90
|
+
- Added `applyModelCeiling()` tests
|
|
91
|
+
- Added `resolveEffortToModel()` tests
|
|
92
|
+
- **`tests/unit/config-command.test.ts`**:
|
|
93
|
+
- Updated tests to use `effortMapping` instead of `effort`
|
|
94
|
+
- **`tests/unit/frontmatter.test.ts`** - NEW FILE:
|
|
95
|
+
- Comprehensive tests for frontmatter parsing
|
|
96
|
+
- Valid frontmatter tests (effort, model, both)
|
|
97
|
+
- No frontmatter tests (missing delimiter, empty content, markdown heading)
|
|
98
|
+
- Warning tests (unknown keys, invalid values)
|
|
99
|
+
- Edge cases (whitespace, tabs, multiple delimiters)
|
|
100
|
+
- **`tests/unit/claude-runner.test.ts`**:
|
|
101
|
+
- Removed `effortLevel` tests
|
|
102
|
+
- Updated to test environment passing without effort override
|
|
103
|
+
- **`tests/unit/claude-runner-interactive.test.ts`**:
|
|
104
|
+
- Updated default model test to accept both short aliases and full model IDs
|
|
105
|
+
- Updated environment test to not depend on user's env vars
|
|
106
|
+
- **`tests/unit/validation.test.ts`**:
|
|
107
|
+
- Updated default model test to accept config-dependent values
|
|
108
|
+
|
|
109
|
+
## Acceptance Criteria Verification
|
|
110
|
+
- [x] The entire `effort.*` config section is removed (types, defaults, validation, accessors, env var)
|
|
111
|
+
- [x] `ClaudeRunner` no longer sets `CLAUDE_CODE_EFFORT_LEVEL`
|
|
112
|
+
- [x] Existing config files with `effort` are handled gracefully (rejected as unknown key with warning)
|
|
113
|
+
- [x] `effortMapping` config exists with sensible defaults (low→haiku, medium→sonnet, high→opus)
|
|
114
|
+
- [x] `models.execute` acts as a ceiling — resolved model is capped to this tier
|
|
115
|
+
- [x] Ceiling works correctly: opus plan + sonnet ceiling = sonnet execution
|
|
116
|
+
- [x] Under-ceiling works correctly: haiku plan + sonnet ceiling = haiku execution
|
|
117
|
+
- [x] Retry escalation: failed task retries use the ceiling model
|
|
118
|
+
- [x] Plan files with frontmatter are parsed correctly (effort and optional model extracted)
|
|
119
|
+
- [x] Plan files without frontmatter produce a warning and fall back to config model
|
|
120
|
+
- [x] Effort label in frontmatter correctly maps to a model via `effortMapping`
|
|
121
|
+
- [x] Explicit `model` in frontmatter takes precedence over `effort` mapping but is still subject to ceiling
|
|
122
|
+
- [x] Planning prompts no longer restrict implementation details
|
|
123
|
+
- [x] Planning prompts mandate effort frontmatter with assessment guidelines
|
|
124
|
+
- [x] Invalid frontmatter values produce a warning but don't block execution
|
|
125
|
+
- [x] Frontmatter parsing doesn't break existing plan files (backwards compatible)
|
|
126
|
+
- [x] Tests cover effort removal, effortMapping, ceiling logic, frontmatter parsing, and override logic
|
|
127
|
+
- [x] All 1273 tests pass
|
|
128
|
+
- [x] TypeScript builds successfully
|
|
129
|
+
|
|
130
|
+
<promise>COMPLETE</promise>
|
|
@@ -0,0 +1,36 @@
|
|
|
1
|
+
# Task: Remove `claudeCommand` from Config
|
|
2
|
+
|
|
3
|
+
## Objective
|
|
4
|
+
Remove the `claudeCommand` configuration key entirely, hardcoding `"claude"` as the CLI binary name.
|
|
5
|
+
|
|
6
|
+
## Context
|
|
7
|
+
The `claudeCommand` config key allows overriding the Claude CLI binary path. In practice this is unnecessary — Claude CLI is always installed as `claude`. Removing it simplifies the config schema and also resolves the PR #4 review comment: with a broken config file, `getClaudeCommand()` could throw before `raf config` launched its repair session. Hardcoding eliminates that failure path.
|
|
8
|
+
|
|
9
|
+
## Requirements
|
|
10
|
+
- Remove `claudeCommand` from `RafConfig` interface and `DEFAULT_CONFIG` in `src/types/config.ts`
|
|
11
|
+
- Remove `getClaudeCommand()` accessor from `src/utils/config.ts`
|
|
12
|
+
- Update `getClaudePath()` in `src/core/claude-runner.ts` to hardcode `"claude"` instead of calling `getClaudeCommand()`
|
|
13
|
+
- Remove `claudeCommand` from config validation logic in `src/utils/config.ts`
|
|
14
|
+
- Update `src/prompts/config-docs.md` to remove the `claudeCommand` section
|
|
15
|
+
- Update any tests that reference `claudeCommand`
|
|
16
|
+
- Verify `raf config` works correctly even when `~/.raf/raf.config.json` is malformed (this is the PR #4 fix — with hardcoded command, `getClaudePath` no longer depends on config)
|
|
17
|
+
|
|
18
|
+
## Implementation Steps
|
|
19
|
+
1. Remove `claudeCommand` from the TypeScript interface and default config
|
|
20
|
+
2. Remove the `getClaudeCommand()` helper and update all call sites to use `"claude"` directly
|
|
21
|
+
3. Update `getClaudePath()` to use hardcoded `"claude"` in the `which` lookup
|
|
22
|
+
4. Remove `claudeCommand` from config validation (the strict validator should reject it as unknown key if a user still has it — consider adding a migration warning or silently ignoring it)
|
|
23
|
+
5. Update config-docs.md documentation
|
|
24
|
+
6. Update/remove affected tests
|
|
25
|
+
7. Verify the `raf config` fallback path no longer depends on config file state
|
|
26
|
+
|
|
27
|
+
## Acceptance Criteria
|
|
28
|
+
- [ ] `claudeCommand` key no longer exists in types, defaults, validation, or documentation
|
|
29
|
+
- [ ] `getClaudePath()` works without reading any config
|
|
30
|
+
- [ ] `raf config` launches successfully even with a completely broken config file
|
|
31
|
+
- [ ] All existing tests pass (updated as needed)
|
|
32
|
+
- [ ] Config files containing `claudeCommand` are handled gracefully (warning or silent ignore)
|
|
33
|
+
|
|
34
|
+
## Notes
|
|
35
|
+
- This also addresses the PR #4 review comment about `raf config` being unusable as a repair path when config is malformed. With the hardcoded command, the entire Claude runner initialization is config-independent.
|
|
36
|
+
- Consider whether to warn users who still have `claudeCommand` in their config or just silently ignore it via validation.
|
|
@@ -0,0 +1,33 @@
|
|
|
1
|
+
# Task: Fix Mixed-Attempt Cost Underreporting
|
|
2
|
+
|
|
3
|
+
## Objective
|
|
4
|
+
Fix cost calculation to compute cost per-attempt rather than on accumulated usage, preventing underreporting when attempts have mixed aggregate-only and per-model usage data.
|
|
5
|
+
|
|
6
|
+
## Context
|
|
7
|
+
`TokenTracker.addTask()` currently calls `accumulateUsage(attempts)` to merge all attempts into one `UsageData`, then calls `calculateCost()` on the merged result. The problem: if some attempts have `modelUsage` populated and others only have aggregate fields (which `extractUsageData` allows), the merged result has a non-empty `modelUsage` map. `calculateCost()` then takes the per-model branch and only prices tokens in `modelUsage`, silently dropping aggregate-only tokens from attempts that lacked `modelUsage`. This means mixed-attempt retries underreport cost.
|
|
8
|
+
|
|
9
|
+
## Requirements
|
|
10
|
+
- Calculate cost independently for each attempt's `UsageData`
|
|
11
|
+
- Each attempt uses per-model pricing if it has `modelUsage`, or aggregate-fallback (Sonnet rates) if it doesn't
|
|
12
|
+
- Sum the per-attempt costs to get the task total
|
|
13
|
+
- The per-attempt cost calculation should also be available for the display formatter (it already receives a `calculateAttemptCost` callback)
|
|
14
|
+
- Preserve the accumulated usage totals for token count display (input/output/cache totals should still be summed across attempts)
|
|
15
|
+
|
|
16
|
+
## Implementation Steps
|
|
17
|
+
1. Modify `addTask()` in `TokenTracker` to calculate cost per-attempt, then sum into the task's `CostBreakdown`
|
|
18
|
+
2. Ensure `calculateCost()` is called on individual attempt `UsageData` objects, not on the accumulated merge
|
|
19
|
+
3. Update the `CostBreakdown` aggregation to sum per-attempt breakdowns
|
|
20
|
+
4. Verify that `formatTaskTokenSummary()` still works correctly — it receives per-attempt cost via callback, so the callback should use single-attempt `calculateCost()`
|
|
21
|
+
5. Add test cases covering the mixed-attempt scenario: one attempt with `modelUsage`, another with only aggregate fields
|
|
22
|
+
|
|
23
|
+
## Acceptance Criteria
|
|
24
|
+
- [ ] Cost is calculated per-attempt, not on merged usage
|
|
25
|
+
- [ ] Mixed attempts (some with modelUsage, some without) report accurate total cost
|
|
26
|
+
- [ ] Per-attempt display in multi-attempt summaries shows correct individual costs
|
|
27
|
+
- [ ] Grand total cost across all tasks remains accurate
|
|
28
|
+
- [ ] New test cases cover the mixed-attempt edge case
|
|
29
|
+
- [ ] Existing token tracking tests still pass
|
|
30
|
+
|
|
31
|
+
## Notes
|
|
32
|
+
- The key insight: `accumulateUsage()` is fine for summing token counts for display, but cost calculation must happen before merging to respect the per-model vs. aggregate distinction per attempt.
|
|
33
|
+
- The `formatTaskTokenSummary` already accepts a `calculateAttemptCost` callback — this callback should call `calculateCost` on individual attempt data, which is already the correct granularity.
|
|
@@ -0,0 +1,82 @@
|
|
|
1
|
+
# Task: Add 5h Window Rate Limit Estimation + Plan Session Token Tracking
|
|
2
|
+
|
|
3
|
+
## Objective
|
|
4
|
+
Add an estimated percentage of the 5-hour rate limit window consumed, displayed after each task and in the grand total summary. Also add token usage tracking and display for `raf plan` interactive sessions.
|
|
5
|
+
|
|
6
|
+
## Dependencies
|
|
7
|
+
02
|
|
8
|
+
|
|
9
|
+
## Context
|
|
10
|
+
Anthropic's subscription plans use a shared credit pool per 5-hour window. The pool is measured in cost-weighted credits, not raw token count. Heavier models (Opus) consume the pool faster than lighter ones (Haiku) in proportion to their API pricing ratios. Users need visibility into how much of their 5-hour window they've consumed during a RAF session.
|
|
11
|
+
|
|
12
|
+
The baseline is 88,000 Sonnet-equivalent tokens per 5h window. All token usage is normalized to Sonnet-equivalent tokens using the API pricing ratios:
|
|
13
|
+
- Haiku input/output costs ~1/3 of Sonnet → 1 Haiku token ≈ 0.33 Sonnet tokens
|
|
14
|
+
- Opus input/output costs ~1.67× of Sonnet → 1 Opus token ≈ 1.67 Sonnet tokens
|
|
15
|
+
- Cache read/create tokens follow the same model-specific pricing ratios
|
|
16
|
+
|
|
17
|
+
## Requirements
|
|
18
|
+
|
|
19
|
+
### Rate Limit Estimation (raf do)
|
|
20
|
+
- Convert all token usage to "Sonnet-equivalent tokens" using the configured pricing ratios
|
|
21
|
+
- The conversion formula: `sonnetEquivalentTokens = actualCost / sonnetCostPerToken` (where sonnet cost per token is derived from the configured Sonnet pricing)
|
|
22
|
+
- **Per-attempt model awareness**: task 08 introduces per-task model selection and retry escalation (a task may start with haiku and retry with sonnet/opus). Cost and rate limit calculations must use the actual model that ran each attempt, not a single model for the whole task. This is already handled if cost is calculated per-attempt (task 02), but the rate limit conversion must also use the correct per-attempt pricing
|
|
23
|
+
- Display estimated 5h window percentage after each task (alongside existing token summary)
|
|
24
|
+
- Display cumulative 5h window percentage in the grand total summary
|
|
25
|
+
- New config keys under `display` section:
|
|
26
|
+
- `display.showRateLimitEstimate` (boolean, default: `true`) — toggle showing the % estimate
|
|
27
|
+
- `display.showCacheTokens` (boolean, default: `true`) — toggle showing cache token counts in summaries
|
|
28
|
+
- New config key for the baseline cap:
|
|
29
|
+
- `rateLimitWindow.sonnetTokenCap` (number, default: `88000`) — the Sonnet-equivalent token cap for the 5h window
|
|
30
|
+
- The percentage is a rough estimate — make this clear in the display (e.g., "~42% of 5h window")
|
|
31
|
+
|
|
32
|
+
### Token Tracking for Plan Sessions (raf plan)
|
|
33
|
+
- After the `raf plan` interactive session ends, display a token usage summary (input/output tokens, cache, estimated cost, 5h window %)
|
|
34
|
+
- Approach: Claude CLI saves session data to `~/.claude/projects/<escaped-path>/<session-id>.jsonl` — each assistant message entry contains usage data (input_tokens, output_tokens, cache tokens, model name)
|
|
35
|
+
- Pass `--session-id <uuid>` to `runInteractive()` so we know exactly which session file to read after the session ends
|
|
36
|
+
- After `runInteractive()` returns, locate and parse the session JSONL file to extract and accumulate all usage data from assistant message entries
|
|
37
|
+
- The session file path is `~/.claude/projects/<escaped-project-path>/<session-id>.jsonl` where the project path is escaped by replacing `/` with `-`
|
|
38
|
+
- Reuse the existing `TokenTracker` and display formatters to show the summary
|
|
39
|
+
- This also applies to `raf plan --amend` sessions
|
|
40
|
+
|
|
41
|
+
## Implementation Steps
|
|
42
|
+
|
|
43
|
+
### Rate Limit Estimation
|
|
44
|
+
1. Add new config types: `display` section with `showRateLimitEstimate` and `showCacheTokens` booleans; `rateLimitWindow` section with `sonnetTokenCap` number
|
|
45
|
+
2. Add defaults to `DEFAULT_CONFIG`, validation rules, and config accessor helpers
|
|
46
|
+
3. Update config-docs.md with the new keys
|
|
47
|
+
4. Implement the Sonnet-equivalent conversion in `TokenTracker` — the simplest approach: use the total estimated cost (already calculated) divided by the Sonnet cost-per-token to get Sonnet-equivalent tokens
|
|
48
|
+
5. Add a method to `TokenTracker` to compute cumulative 5h window percentage
|
|
49
|
+
6. Update `formatTaskTokenSummary()` to optionally append the window percentage
|
|
50
|
+
7. Update `formatTokenTotalSummary()` to optionally show the cumulative window percentage
|
|
51
|
+
8. Respect the `display.showRateLimitEstimate` and `display.showCacheTokens` config flags in the formatters
|
|
52
|
+
|
|
53
|
+
### Plan Session Token Tracking
|
|
54
|
+
9. Modify `runInteractive()` in `claude-runner.ts` to accept an optional `sessionId` parameter and pass it as `--session-id <uuid>` to the Claude CLI spawn
|
|
55
|
+
10. In `plan.ts` (both plan and amend flows), generate a UUID before calling `runInteractive()` and pass it
|
|
56
|
+
11. Create a utility to locate and parse the Claude session JSONL file: read `~/.claude/projects/<escaped-path>/<session-id>.jsonl`, extract usage data from all assistant message entries, and accumulate into a `UsageData` structure
|
|
57
|
+
12. After `runInteractive()` returns in `plan.ts`, call the session parser, feed results to `TokenTracker`, and display the summary using existing formatters
|
|
58
|
+
13. Handle edge cases: session file not found (Claude CLI may change storage), malformed entries, zero usage
|
|
59
|
+
|
|
60
|
+
### Tests
|
|
61
|
+
14. Add tests for the conversion logic, display formatting, and session file parsing
|
|
62
|
+
|
|
63
|
+
## Acceptance Criteria
|
|
64
|
+
- [ ] After each task, the token summary includes `~X% of 5h window` when enabled
|
|
65
|
+
- [ ] Grand total summary includes cumulative `~X% of 5h window` when enabled
|
|
66
|
+
- [ ] Percentage correctly reflects cost-weighted usage (Opus tasks consume more % than Haiku tasks for same raw token count)
|
|
67
|
+
- [ ] Multi-model tasks (retry escalation) correctly account for different models across attempts in both cost and rate limit calculations
|
|
68
|
+
- [ ] `display.showRateLimitEstimate: false` hides the percentage
|
|
69
|
+
- [ ] `display.showCacheTokens: false` hides cache read/create token counts from summaries
|
|
70
|
+
- [ ] `rateLimitWindow.sonnetTokenCap` correctly adjusts the denominator
|
|
71
|
+
- [ ] Config validation accepts the new keys
|
|
72
|
+
- [ ] Config docs updated with new keys and explanation
|
|
73
|
+
- [ ] After `raf plan` interactive session, a token usage summary is displayed
|
|
74
|
+
- [ ] After `raf plan --amend` interactive session, a token usage summary is displayed
|
|
75
|
+
- [ ] Session file parsing handles missing/malformed files gracefully (warn, don't crash)
|
|
76
|
+
- [ ] Tests cover the conversion math, display toggling, and session file parsing
|
|
77
|
+
|
|
78
|
+
## Notes
|
|
79
|
+
- The percentage is inherently an estimate — the actual Anthropic rate limit algorithm may differ. The display should communicate this (tilde prefix).
|
|
80
|
+
- The conversion can be simplified by reusing the already-computed dollar cost: `sonnetEquivalentTokens = totalCost / ((sonnetInputPrice + sonnetOutputPrice) / 2M)`. But a more accurate approach would normalize input and output tokens separately since they have different price ratios. Consider which approach is more appropriate.
|
|
81
|
+
- This task depends on task 02 (fix mixed-attempt cost) because accurate cost calculation is the foundation for accurate percentage estimation.
|
|
82
|
+
- Task 08 introduces per-task model selection and retry escalation (cheaper model on first attempt, ceiling model on retry). The per-attempt cost calculation from task 02 already handles different models per attempt via `modelUsage`, but the Sonnet-equivalent conversion for rate limit % must also respect per-attempt model differences. Since conversion is derived from cost (which is already per-attempt), this should work naturally — but verify with a test case covering a multi-model retry scenario.
|
|
@@ -0,0 +1,32 @@
|
|
|
1
|
+
# Task: Show RAF Version and Model in `raf do` Logs
|
|
2
|
+
|
|
3
|
+
## Objective
|
|
4
|
+
Display the RAF version and execution model in a single combined line at the start of `raf do` execution.
|
|
5
|
+
|
|
6
|
+
## Context
|
|
7
|
+
Currently `raf do` logs don't prominently show what version of RAF is running or which model is being used for execution. This information is useful for debugging and for users to confirm their setup. The model name should be shown in its full format (e.g., `claude-opus-4-6` rather than just `opus`).
|
|
8
|
+
|
|
9
|
+
## Requirements
|
|
10
|
+
- Display a single combined line at the start of task execution, before any tasks run
|
|
11
|
+
- Format: `RAF v{version} | Model: {fullModelId}`
|
|
12
|
+
- Version comes from `package.json` via the existing `getVersion()` utility
|
|
13
|
+
- Model should be the full model ID (e.g., `claude-opus-4-6`), not the short alias
|
|
14
|
+
- Use the existing logger formatting (e.g., `logger.info` or appropriate level)
|
|
15
|
+
- Do NOT show effort level — the `effort.*` config is being removed in task 08
|
|
16
|
+
|
|
17
|
+
## Implementation Steps
|
|
18
|
+
1. In `src/commands/do.ts`, add a log line at the start of the execution flow (before the first task begins)
|
|
19
|
+
2. Resolve the full model ID — if the config uses a short alias like `opus`, resolve it to the full model ID
|
|
20
|
+
3. Format and display the combined line
|
|
21
|
+
4. Ensure this appears in both worktree and non-worktree execution modes
|
|
22
|
+
|
|
23
|
+
## Acceptance Criteria
|
|
24
|
+
- [ ] A version/model line appears at the start of every `raf do` execution
|
|
25
|
+
- [ ] Model name is shown in full format (e.g., `claude-opus-4-6`)
|
|
26
|
+
- [ ] Line appears before any task execution output
|
|
27
|
+
- [ ] Works in both worktree and non-worktree modes
|
|
28
|
+
|
|
29
|
+
## Notes
|
|
30
|
+
- The existing `getVersion()` utility is in `src/utils/version.ts`.
|
|
31
|
+
- Model resolution from short alias to full ID may already exist in the codebase — check how `ClaudeRunner` resolves model names.
|
|
32
|
+
- Keep the display subtle (dim or info level) so it doesn't clutter the output.
|
|
@@ -0,0 +1,40 @@
|
|
|
1
|
+
# Task: Sync Main Branch Before Worktree/PR Operations
|
|
2
|
+
|
|
3
|
+
## Objective
|
|
4
|
+
Automatically push main to remote before creating a PR and pull main from remote before creating a git worktree, with a configurable toggle.
|
|
5
|
+
|
|
6
|
+
## Context
|
|
7
|
+
When working with worktrees, the worktree is branched from the current state of the main branch. If main is behind the remote, the worktree starts from stale code. Similarly, before creating a PR, the main branch should be pushed to ensure the remote has the latest state for the PR base. Auto-detecting the main branch (from `origin/HEAD`) avoids hardcoding assumptions about branch naming.
|
|
8
|
+
|
|
9
|
+
## Requirements
|
|
10
|
+
- Before creating a worktree (`raf plan --worktree` or `raf do --worktree`): pull the main branch from remote to ensure the worktree starts from the latest code
|
|
11
|
+
- Before creating a PR (post-execution "Create PR" action): push the main branch to remote so the PR base is up to date
|
|
12
|
+
- Auto-detect the main branch name from `refs/remotes/origin/HEAD` (the same detection logic used in `pull-request.ts` via `detectBaseBranch()`)
|
|
13
|
+
- New config key: `syncMainBranch` (boolean, default: `true`)
|
|
14
|
+
- When `syncMainBranch` is `false`, skip both push and pull operations
|
|
15
|
+
- Handle failures gracefully: if push/pull fails (e.g., no remote, auth issues), warn but don't block the operation
|
|
16
|
+
|
|
17
|
+
## Implementation Steps
|
|
18
|
+
1. Add `syncMainBranch` config key to `RafConfig` interface, `DEFAULT_CONFIG`, validation, and config-docs.md
|
|
19
|
+
2. Add `getSyncMainBranch()` accessor in `src/utils/config.ts`
|
|
20
|
+
3. Reuse or extract `detectBaseBranch()` from `src/core/pull-request.ts` into a shared utility (it's already used for PR base detection)
|
|
21
|
+
4. Add a `syncMainBranch()` utility function that pulls main before worktree creation
|
|
22
|
+
5. Integrate pull into the worktree creation flow in `src/core/worktree.ts` or the calling code in `do.ts`/`plan` command
|
|
23
|
+
6. Integrate push into the PR creation flow — before `createPullRequest()` is called, push main
|
|
24
|
+
7. Add appropriate logging (info level) when syncing occurs
|
|
25
|
+
8. Handle errors: catch failures, log warning, continue with the operation
|
|
26
|
+
9. Update config-docs.md
|
|
27
|
+
|
|
28
|
+
## Acceptance Criteria
|
|
29
|
+
- [ ] Main branch is pulled from remote before worktree creation (when `syncMainBranch: true`)
|
|
30
|
+
- [ ] Main branch is pushed to remote before PR creation (when `syncMainBranch: true`)
|
|
31
|
+
- [ ] Main branch name is auto-detected from `origin/HEAD`
|
|
32
|
+
- [ ] `syncMainBranch: false` skips both operations
|
|
33
|
+
- [ ] Failures in push/pull produce warnings but don't block the workflow
|
|
34
|
+
- [ ] Config validation accepts the new key
|
|
35
|
+
- [ ] Config docs updated
|
|
36
|
+
|
|
37
|
+
## Notes
|
|
38
|
+
- `detectBaseBranch()` in `src/core/pull-request.ts` already handles the `origin/HEAD` detection with fallback to `main`/`master`. Reuse this logic rather than duplicating it.
|
|
39
|
+
- The pull should only pull the main branch, not the current branch or all branches.
|
|
40
|
+
- Be careful about the pull: if the user has uncommitted changes on main, a pull could fail. Consider using `git fetch origin main && git merge --ff-only origin/main` which fails cleanly if main has diverged.
|
|
@@ -0,0 +1,61 @@
|
|
|
1
|
+
# Task: Sync README with Codebase (Critical Items)
|
|
2
|
+
|
|
3
|
+
## Objective
|
|
4
|
+
Fix critical discrepancies between README.md and the actual codebase implementation.
|
|
5
|
+
|
|
6
|
+
## Context
|
|
7
|
+
The README documents features that no longer exist (like `--merge` flag) and is missing documentation for major features (post-execution picker, PR creation). This causes user confusion and makes the tool harder to adopt.
|
|
8
|
+
|
|
9
|
+
## Dependencies
|
|
10
|
+
01, 05
|
|
11
|
+
|
|
12
|
+
## Requirements
|
|
13
|
+
Fix these three critical discrepancies:
|
|
14
|
+
|
|
15
|
+
### 1. Remove `--merge` flag references
|
|
16
|
+
The `--merge` CLI flag for `raf do` is documented in the README but does not exist in the code. It was replaced by an interactive post-execution action picker. All references to `--merge` must be removed and replaced with the actual behavior.
|
|
17
|
+
|
|
18
|
+
Affected locations in README:
|
|
19
|
+
- Usage examples showing `raf do my-feature -w --merge`
|
|
20
|
+
- Command Reference table listing `--merge` as a flag
|
|
21
|
+
- Any other mentions of `--merge`
|
|
22
|
+
|
|
23
|
+
### 2. Document the post-execution action picker
|
|
24
|
+
When running `raf do` in worktree mode, an interactive picker appears BEFORE task execution asking what to do after tasks complete. The three options are:
|
|
25
|
+
- **Merge** — merge branch into the original branch (fast-forward preferred, merge-commit fallback)
|
|
26
|
+
- **Create PR** — push branch and create a GitHub PR
|
|
27
|
+
- **Leave branch** — keep the branch as-is, do nothing
|
|
28
|
+
|
|
29
|
+
This is implemented in `src/commands/do.ts` via `pickPostExecutionAction()`. On task failure, the chosen post-action is skipped. After successful post-actions (merge, PR, leave), the worktree directory is cleaned up automatically (the git branch is preserved).
|
|
30
|
+
|
|
31
|
+
### 3. Document PR creation from worktree
|
|
32
|
+
The "Create PR" post-execution action is a significant feature not mentioned in the README at all. It:
|
|
33
|
+
- Requires `gh` CLI installed and authenticated
|
|
34
|
+
- Auto-detects the base branch from `origin/HEAD`
|
|
35
|
+
- Generates a PR title from the project name
|
|
36
|
+
- Generates a PR body using Claude summarizing input.md, decisions.md, and outcomes
|
|
37
|
+
- Auto-pushes the branch to origin if needed
|
|
38
|
+
- Runs preflight checks; falls back to "leave branch" if `gh` is missing or unauthenticated
|
|
39
|
+
|
|
40
|
+
Also fix the worktree cleanup description — the README currently says worktrees persist and need manual cleanup, but they're actually auto-cleaned after post-actions (only the git branch is preserved). On failure, the worktree IS kept for inspection.
|
|
41
|
+
|
|
42
|
+
## Implementation Steps
|
|
43
|
+
1. Read the current README.md thoroughly
|
|
44
|
+
2. Remove all `--merge` flag references from examples and command reference
|
|
45
|
+
3. Update the Worktree Mode section to describe the post-execution picker flow
|
|
46
|
+
4. Add documentation about PR creation capability and its requirements (`gh` CLI)
|
|
47
|
+
5. Fix the worktree cleanup description to reflect auto-cleanup behavior
|
|
48
|
+
6. Ensure examples in the README use valid, existing flags only
|
|
49
|
+
7. Review the updated text for consistency and accuracy
|
|
50
|
+
|
|
51
|
+
## Acceptance Criteria
|
|
52
|
+
- [ ] No references to `--merge` flag remain in README
|
|
53
|
+
- [ ] Post-execution action picker is documented with all three options
|
|
54
|
+
- [ ] PR creation from worktree is documented including prerequisites
|
|
55
|
+
- [ ] Worktree cleanup behavior is accurately described
|
|
56
|
+
- [ ] All CLI examples use valid, existing flags
|
|
57
|
+
- [ ] README reads naturally and doesn't feel patched
|
|
58
|
+
|
|
59
|
+
## Notes
|
|
60
|
+
- This task depends on 01 (remove `claudeCommand`) and 05 (sync main branch) because those tasks add/change config keys that should be reflected if mentioned in README.
|
|
61
|
+
- Only fix the critical items listed above. Medium and low priority discrepancies (missing verbose flag in table, blocked symbol, token tracking docs, effort/pricing docs) are out of scope for this task.
|
|
@@ -0,0 +1,28 @@
|
|
|
1
|
+
# Task: Add --no-session-persistence to Throwaway Claude Calls
|
|
2
|
+
|
|
3
|
+
## Objective
|
|
4
|
+
Prevent PR body generation and failure analysis Claude calls from polluting the user's session history.
|
|
5
|
+
|
|
6
|
+
## Context
|
|
7
|
+
Claude CLI saves every session to disk by default, making them appear in `claude --resume`. Throwaway utility calls (PR body generation, failure analysis) clutter this history with sessions the user will never want to resume. The name generation utility already solved this by adding `--no-session-persistence` to its `spawn()` call (implemented in the token-reaper project). The same pattern should be applied to the remaining throwaway Claude invocations.
|
|
8
|
+
|
|
9
|
+
## Requirements
|
|
10
|
+
- Add `--no-session-persistence` flag to the `spawn()` call in `callClaudeForPrBody()` in `src/core/pull-request.ts`
|
|
11
|
+
- Add `--no-session-persistence` flag to the `spawn()` call in the failure analyzer in `src/core/failure-analyzer.ts`
|
|
12
|
+
- Both already use `-p` (print mode), which is required for `--no-session-persistence` to work
|
|
13
|
+
- Follow the exact same pattern used in `src/utils/name-generator.ts`
|
|
14
|
+
|
|
15
|
+
## Implementation Steps
|
|
16
|
+
1. In `src/core/pull-request.ts`, add `'--no-session-persistence'` to the args array in `callClaudeForPrBody()`
|
|
17
|
+
2. In `src/core/failure-analyzer.ts`, add `'--no-session-persistence'` to the args array in the Claude spawn call
|
|
18
|
+
3. Verify both functions still work correctly — the flag should be transparent to the output
|
|
19
|
+
|
|
20
|
+
## Acceptance Criteria
|
|
21
|
+
- [ ] PR body generation sessions don't appear in `claude --resume`
|
|
22
|
+
- [ ] Failure analysis sessions don't appear in `claude --resume`
|
|
23
|
+
- [ ] Both features still function correctly (output unchanged)
|
|
24
|
+
- [ ] Pattern matches the existing implementation in `name-generator.ts`
|
|
25
|
+
|
|
26
|
+
## Notes
|
|
27
|
+
- This is a minimal two-line change (one per file). The flag is well-tested in name-generator.ts already.
|
|
28
|
+
- The `--no-session-persistence` flag only works with `-p` (print mode), which both call sites already use.
|
|
@@ -0,0 +1,123 @@
|
|
|
1
|
+
# Task: Add Per-Task Execution Metadata + Remove Effort Config
|
|
2
|
+
|
|
3
|
+
## Objective
|
|
4
|
+
Add Obsidian-style frontmatter to plan files with required effort metadata, introduce an effort-to-model mapping config, redefine `models.execute` as a ceiling, remove the legacy `effort.*` config section entirely, and relax planning prompt restrictions.
|
|
5
|
+
|
|
6
|
+
## Context
|
|
7
|
+
The philosophy is "plan with a smart model, execute with a less smart model when possible." Currently, all tasks execute with the same globally-configured model. By adding frontmatter metadata to plan files during the planning stage, the planner (typically Opus) can assess each task's complexity and recommend the appropriate execution model.
|
|
8
|
+
|
|
9
|
+
The config's role shifts from "pick one model for all tasks" to "set the budget ceiling." The planner recommends effort per task, which maps to a model via `effortMapping`. The final model is capped by `models.execute` (the ceiling). This gives users budget control while letting the planner differentiate tasks.
|
|
10
|
+
|
|
11
|
+
The existing `effort.*` config section (which maps to Claude CLI's `--effort` flag via `CLAUDE_CODE_EFFORT_LEVEL` env var) should be removed entirely — it's a separate concept from the task complexity "effort" label in plan frontmatter.
|
|
12
|
+
|
|
13
|
+
Additionally, the planning prompts currently contain restrictive wording that discourages implementation details in plans. This wording should be removed to allow the planning model to include whatever level of detail it deems appropriate.
|
|
14
|
+
|
|
15
|
+
## Dependencies
|
|
16
|
+
04
|
|
17
|
+
|
|
18
|
+
## Requirements
|
|
19
|
+
|
|
20
|
+
### Frontmatter Metadata in Plan Files
|
|
21
|
+
- Plan files MUST have Obsidian-style properties at the top, before the `# Task:` heading
|
|
22
|
+
- Format uses only a closing `---` delimiter (no opening delimiter):
|
|
23
|
+
```
|
|
24
|
+
effort: medium
|
|
25
|
+
---
|
|
26
|
+
# Task: ...
|
|
27
|
+
```
|
|
28
|
+
- `effort` is REQUIRED — a human-readable task complexity label (low/medium/high) that maps to a model. NOT Claude's `--effort` flag
|
|
29
|
+
- `model` is OPTIONAL — an explicit model override (short alias or full model ID) that bypasses the effort mapping entirely
|
|
30
|
+
- If both `model` and `effort` are present, `model` takes precedence over the effort mapping
|
|
31
|
+
- If frontmatter is missing (e.g., manually created plans), warn and fall back to the config default model
|
|
32
|
+
|
|
33
|
+
### Effort-to-Model Mapping
|
|
34
|
+
- New config section: `effortMapping` that maps complexity labels to model names
|
|
35
|
+
- Default mapping: `{ low: "haiku", medium: "sonnet", high: "opus" }`
|
|
36
|
+
- When a plan has `effort: medium`, RAF resolves it to the model from the mapping (e.g., sonnet)
|
|
37
|
+
- The mapping values follow the same validation as model names (short aliases or full model IDs)
|
|
38
|
+
- Add to `DEFAULT_CONFIG`, validation, config-docs.md
|
|
39
|
+
|
|
40
|
+
### Config as Ceiling + Fallback
|
|
41
|
+
- `models.execute` serves dual purpose:
|
|
42
|
+
1. **Ceiling**: the maximum model tier allowed for task execution
|
|
43
|
+
2. **Fallback**: the model used when a plan has no effort frontmatter (e.g., manually created plans, legacy plans)
|
|
44
|
+
- Model tier ordering: haiku < sonnet < opus (based on pricing — cheaper = lower tier)
|
|
45
|
+
- When frontmatter IS present: final model = `min(resolved_model, models.execute)` where "min" means the cheaper/lower-tier model
|
|
46
|
+
- When frontmatter is MISSING: final model = `models.execute` directly (with a warning about missing frontmatter)
|
|
47
|
+
- Example: if `models.execute: "sonnet"` (ceiling) and plan says `effort: high` (maps to opus), the task runs with sonnet (capped)
|
|
48
|
+
- Example: if `models.execute: "sonnet"` and plan says `effort: low` (maps to haiku), the task runs with haiku (under ceiling, no cap)
|
|
49
|
+
- Example: if plan has no frontmatter, the task runs with sonnet (fallback)
|
|
50
|
+
- The explicit `model` field in frontmatter is ALSO subject to the ceiling — no way to exceed the config ceiling from a plan file
|
|
51
|
+
- **Retry escalation**: when a task fails and is retried, bump the model to `models.execute` (the ceiling) for the retry attempt. If the first attempt already used the ceiling model, retry with the same model. This gives failing tasks the best available model on subsequent attempts
|
|
52
|
+
- Implement a `getModelTier()` utility that returns a numeric tier for comparison (using pricing ratios or a simple ordered list)
|
|
53
|
+
|
|
54
|
+
### Remove Legacy Effort Config
|
|
55
|
+
- Remove the entire `effort.*` config section from `RafConfig` interface in `src/types/config.ts`
|
|
56
|
+
- Remove `EffortConfig`, `EffortScenario`, `EffortLevel` types, `VALID_EFFORTS` constant
|
|
57
|
+
- Remove `getEffort()` accessor from `src/utils/config.ts`
|
|
58
|
+
- Remove effort validation logic from config validation
|
|
59
|
+
- Remove `effortLevel` option from `ClaudeRunner` run options in `src/core/claude-runner.ts`
|
|
60
|
+
- Remove `CLAUDE_CODE_EFFORT_LEVEL` env var injection from `ClaudeRunner`
|
|
61
|
+
- Remove all `getEffort()` call sites:
|
|
62
|
+
- `src/commands/do.ts` — `getEffort('execute')` passed to runner options
|
|
63
|
+
- `src/commands/config.ts` — `getEffort('config')` and its fallback
|
|
64
|
+
- Any other call sites
|
|
65
|
+
- Remove effort from `src/prompts/config-docs.md`
|
|
66
|
+
- Update tests that reference effort config
|
|
67
|
+
|
|
68
|
+
### Planning Prompt Changes
|
|
69
|
+
- Remove the "Plan Output Style" section that says "CRITICAL: Plans should be HIGH-LEVEL and CONCEPTUAL" from both `src/prompts/planning.ts` and `src/prompts/amend.ts`
|
|
70
|
+
- Remove the restrictive bullet points: "Describe WHAT needs to be done, not HOW to code it", "NO code snippets or implementation details in plans"
|
|
71
|
+
- Replace with neutral guidance: plans can include whatever level of detail the planner deems helpful
|
|
72
|
+
- Add instructions that the planner MUST include effort frontmatter on every task, with guidance on how to assess complexity:
|
|
73
|
+
- `low` — trivial/mechanical changes, simple one-file edits, config changes
|
|
74
|
+
- `medium` — well-scoped feature work, bug fixes with clear plans, multi-file changes following existing patterns
|
|
75
|
+
- `high` — architectural changes, complex logic, tasks requiring deep codebase understanding
|
|
76
|
+
- Document the frontmatter format (Obsidian-style, closing `---` only) in the prompt
|
|
77
|
+
|
|
78
|
+
## Implementation Steps
|
|
79
|
+
1. Remove the legacy effort config: delete `EffortConfig`, `EffortScenario`, `EffortLevel` types, `VALID_EFFORTS`, `getEffort()`, the `effort` key from `RafConfig` and `DEFAULT_CONFIG`, effort validation, `effortLevel` from `ClaudeRunner` run options, `CLAUDE_CODE_EFFORT_LEVEL` env var logic
|
|
80
|
+
2. Remove all `getEffort()` call sites in `do.ts`, `config.ts`, and anywhere else
|
|
81
|
+
3. Add new `effortMapping` config section: type definition, defaults (`{ low: "haiku", medium: "sonnet", high: "opus" }`), validation (values must be valid model names), accessor helper
|
|
82
|
+
4. Implement model tier comparison: a `getModelTier()` utility that returns a numeric rank based on the model family (haiku=1, sonnet=2, opus=3). For full model IDs, extract the family name. For unknown models, default to highest tier (no cap)
|
|
83
|
+
5. Redefine `models.execute` semantics: it now acts as ceiling + fallback. Update its documentation to reflect this
|
|
84
|
+
6. Create a frontmatter parser utility that extracts `model` and `effort` from plan file content (parse `key: value` lines before the closing `---` delimiter)
|
|
85
|
+
7. Integrate frontmatter parsing into `state-derivation.ts` alongside the existing `parseDependencies()` — store parsed metadata on the task state object
|
|
86
|
+
8. In `do.ts`, before each task execution, resolve the per-task model:
|
|
87
|
+
- Read frontmatter `model` or resolve `effort` via `effortMapping`
|
|
88
|
+
- Apply ceiling: `min(resolved_model, models.execute)` using tier comparison
|
|
89
|
+
- Fall back to `models.execute` if no frontmatter (fallback role)
|
|
90
|
+
9. Implement retry escalation: when retrying a failed task, use `models.execute` (ceiling) instead of the original resolved model. The retry logic in `do.ts` should detect attempt > 1 and escalate
|
|
91
|
+
10. Modify `ClaudeRunner` or the task execution loop to support per-task model (consider creating a new runner instance per task if the model differs)
|
|
92
|
+
10. Update `src/prompts/planning.ts` — remove the restrictive "Plan Output Style" section, replace with neutral guidance, add required frontmatter format instructions with effort assessment criteria
|
|
93
|
+
11. Update `src/prompts/amend.ts` — same prompt changes
|
|
94
|
+
12. Update config-docs.md: remove effort section, add effortMapping section, update models.execute description to "ceiling"
|
|
95
|
+
13. Update CLAUDE.md: update "Plan File Structure" to include frontmatter, remove effort references, document ceiling behavior
|
|
96
|
+
14. Update/remove affected tests
|
|
97
|
+
|
|
98
|
+
## Acceptance Criteria
|
|
99
|
+
- [ ] The entire `effort.*` config section is removed (types, defaults, validation, accessors, env var)
|
|
100
|
+
- [ ] `ClaudeRunner` no longer sets `CLAUDE_CODE_EFFORT_LEVEL`
|
|
101
|
+
- [ ] Existing config files with `effort` are handled gracefully (warning or silent ignore)
|
|
102
|
+
- [ ] `effortMapping` config exists with sensible defaults (low→haiku, medium→sonnet, high→opus)
|
|
103
|
+
- [ ] `models.execute` acts as a ceiling — resolved model is capped to this tier
|
|
104
|
+
- [ ] Ceiling works correctly: opus plan + sonnet ceiling = sonnet execution
|
|
105
|
+
- [ ] Under-ceiling works correctly: haiku plan + sonnet ceiling = haiku execution
|
|
106
|
+
- [ ] Retry escalation: failed task retries use the ceiling model
|
|
107
|
+
- [ ] Plan files with frontmatter are parsed correctly (effort and optional model extracted)
|
|
108
|
+
- [ ] Plan files without frontmatter produce a warning and fall back to config model
|
|
109
|
+
- [ ] Effort label in frontmatter correctly maps to a model via `effortMapping`
|
|
110
|
+
- [ ] Explicit `model` in frontmatter takes precedence over `effort` mapping but is still subject to ceiling
|
|
111
|
+
- [ ] Planning prompts no longer restrict implementation details
|
|
112
|
+
- [ ] Planning prompts mandate effort frontmatter with assessment guidelines
|
|
113
|
+
- [ ] Invalid frontmatter values produce a warning but don't block execution
|
|
114
|
+
- [ ] Frontmatter parsing doesn't break existing plan files (backwards compatible)
|
|
115
|
+
- [ ] Tests cover effort removal, effortMapping, ceiling logic, frontmatter parsing, and override logic
|
|
116
|
+
|
|
117
|
+
## Notes
|
|
118
|
+
- The frontmatter parser should be lenient: ignore unknown keys, handle missing `---` delimiter gracefully, treat malformed properties as "no frontmatter" rather than erroring. The format is Obsidian-style: `key: value` lines at the top of the file, terminated by a `---` line (no opening delimiter).
|
|
119
|
+
- This task depends on task 04 (version/model display) since the per-task model override should be visible in the execution log line.
|
|
120
|
+
- The `parseDependencies()` function in `state-derivation.ts` already reads plan file content — the frontmatter parser can be called at the same point, avoiding a second file read.
|
|
121
|
+
- Removing effort config is a breaking change for users who have `effort.*` in their config file. The config validator should handle this gracefully (warn about unknown key, don't crash).
|
|
122
|
+
- Model tier comparison for full model IDs: extract the family name (e.g., `claude-opus-4-6` → `opus`) and use the same tier ordering. Unknown families should default to the highest tier so they're never accidentally capped.
|
|
123
|
+
- The ceiling concept also applies to the explicit `model` frontmatter field — a plan cannot exceed the user's configured budget. This is intentional: the user always has final say on cost.
|
|
@@ -0,0 +1,37 @@
|
|
|
1
|
+
# Project Decisions
|
|
2
|
+
|
|
3
|
+
## For `raf config --get/--set`: Should `--get` with no key show the full merged config (defaults + overrides), or only user overrides from the config file?
|
|
4
|
+
Full merged config — Shows the complete resolved config with all defaults filled in, so users see every active setting.
|
|
5
|
+
|
|
6
|
+
## For `raf config --set`: Should setting a value to its default automatically remove it from the config file (keeping config minimal), or always write explicitly?
|
|
7
|
+
Remove if default — If the value matches the default, remove the key from the config file to keep it clean.
|
|
8
|
+
|
|
9
|
+
## For the diverged main branch fix: Should callers treat this as a hard error or a warning?
|
|
10
|
+
Warning only — Show a visible warning (yellow) but continue execution. Stale base is better than blocking work.
|
|
11
|
+
|
|
12
|
+
## For the name generation fix: Should the fix tighten the prompt, add response parsing, or both?
|
|
13
|
+
Prompt only — Just improve the prompt instructions to be more explicit about output format.
|
|
14
|
+
|
|
15
|
+
## For `raf config --get models.plan`: Should the output be plain value only or formatted?
|
|
16
|
+
Plain value — Just print 'opus'. Easy to pipe into other commands.
|
|
17
|
+
|
|
18
|
+
## For the token investigation task: Research only or plan fix now?
|
|
19
|
+
Research now and create task if needed — investigated and found it's a display issue: `input_tokens` from Claude API only reports non-cached tokens. Fix plan created as task 07 to show total input in display.
|
|
20
|
+
|
|
21
|
+
## Token investigation findings
|
|
22
|
+
Root cause: Claude API separates `input_tokens` (non-cached, e.g. 3-22) from `cache_read_input_tokens` (e.g. 18,000+) and `cache_creation_input_tokens` (e.g. 6,000+). RAF displays only non-cached as "X in". Fix: sum all three for the display. Cost calculation is already correct.
|
|
23
|
+
|
|
24
|
+
## For amend commit fix: Should plan files be removed from the commit or committed separately?
|
|
25
|
+
Remove from commit — Only commit input.md and decisions.md during planning. Plan files get committed by Claude during execution.
|
|
26
|
+
|
|
27
|
+
## For syncing worktree branch with main before execution: Rebase or merge?
|
|
28
|
+
Rebase onto main — Cleaner history, replays worktree commits on top of latest main.
|
|
29
|
+
|
|
30
|
+
## If the rebase has conflicts, should `raf do` abort or skip sync?
|
|
31
|
+
Skip sync and warn — Skip the sync step, show a warning, and continue execution on the current branch state.
|
|
32
|
+
|
|
33
|
+
## For removing token report from plan: Should `session-parser.ts` and its tests be deleted or kept?
|
|
34
|
+
Delete session-parser entirely — Remove session-parser.ts and its test file since they have no other consumers.
|
|
35
|
+
|
|
36
|
+
## Should the `sessionId` parameter be removed from `runInteractive()` in claude-runner.ts?
|
|
37
|
+
Remove sessionId from runInteractive() too — Clean up the runner API as well since no other callers pass sessionId.
|