rafcode 2.2.0 → 2.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (49) hide show
  1. package/RAF/ahtahs-token-reaper/decisions.md +37 -0
  2. package/RAF/ahtahs-token-reaper/input.md +20 -0
  3. package/RAF/ahtahs-token-reaper/outcomes/01-extend-token-tracker-data-model.md +42 -0
  4. package/RAF/ahtahs-token-reaper/outcomes/02-accumulate-usage-in-retry-loop.md +31 -0
  5. package/RAF/ahtahs-token-reaper/outcomes/03-per-attempt-display-formatting.md +60 -0
  6. package/RAF/ahtahs-token-reaper/outcomes/04-add-model-name-to-claude-call-logs.md +57 -0
  7. package/RAF/ahtahs-token-reaper/outcomes/05-handle-invalid-config-in-raf-config.md +46 -0
  8. package/RAF/ahtahs-token-reaper/outcomes/06-fix-verbose-toggle-timer-display.md +38 -0
  9. package/RAF/ahtahs-token-reaper/plans/01-extend-token-tracker-data-model.md +36 -0
  10. package/RAF/ahtahs-token-reaper/plans/02-accumulate-usage-in-retry-loop.md +36 -0
  11. package/RAF/ahtahs-token-reaper/plans/03-per-attempt-display-formatting.md +43 -0
  12. package/RAF/ahtahs-token-reaper/plans/04-add-model-name-to-claude-call-logs.md +38 -0
  13. package/RAF/ahtahs-token-reaper/plans/05-handle-invalid-config-in-raf-config.md +36 -0
  14. package/RAF/ahtahs-token-reaper/plans/06-fix-verbose-toggle-timer-display.md +40 -0
  15. package/dist/commands/config.d.ts.map +1 -1
  16. package/dist/commands/config.js +27 -5
  17. package/dist/commands/config.js.map +1 -1
  18. package/dist/commands/do.js +17 -10
  19. package/dist/commands/do.js.map +1 -1
  20. package/dist/commands/plan.js +3 -2
  21. package/dist/commands/plan.js.map +1 -1
  22. package/dist/core/pull-request.d.ts.map +1 -1
  23. package/dist/core/pull-request.js +3 -1
  24. package/dist/core/pull-request.js.map +1 -1
  25. package/dist/utils/config.d.ts +6 -0
  26. package/dist/utils/config.d.ts.map +1 -1
  27. package/dist/utils/config.js +21 -0
  28. package/dist/utils/config.js.map +1 -1
  29. package/dist/utils/terminal-symbols.d.ts +8 -4
  30. package/dist/utils/terminal-symbols.d.ts.map +1 -1
  31. package/dist/utils/terminal-symbols.js +31 -6
  32. package/dist/utils/terminal-symbols.js.map +1 -1
  33. package/dist/utils/token-tracker.d.ts +11 -1
  34. package/dist/utils/token-tracker.d.ts.map +1 -1
  35. package/dist/utils/token-tracker.js +37 -2
  36. package/dist/utils/token-tracker.js.map +1 -1
  37. package/package.json +1 -1
  38. package/src/commands/config.ts +30 -4
  39. package/src/commands/do.ts +17 -10
  40. package/src/commands/plan.ts +3 -2
  41. package/src/core/pull-request.ts +3 -1
  42. package/src/utils/config.ts +22 -0
  43. package/src/utils/terminal-symbols.ts +42 -7
  44. package/src/utils/token-tracker.ts +44 -2
  45. package/tests/unit/config-command.test.ts +80 -1
  46. package/tests/unit/config.test.ts +24 -0
  47. package/tests/unit/terminal-symbols.test.ts +121 -33
  48. package/tests/unit/timer-verbose-integration.test.ts +170 -0
  49. package/tests/unit/token-tracker.test.ts +350 -17
@@ -0,0 +1,37 @@
1
+ # Project Decisions
2
+
3
+ ## For the per-task token summary, should it show accumulated total or per-attempt breakdown?
4
+ Per-attempt breakdown — show token usage for each attempt individually, plus a combined total.
5
+
6
+ ## Should per-attempt breakdown appear in normal output or only with --verbose?
7
+ Always show breakdown — per-attempt details shown regardless of verbose flag for full cost transparency.
8
+
9
+ ## Should TokenTracker store per-attempt data, or should accumulation happen in do.ts?
10
+ Tracker stores attempts — TokenTracker gains a richer data model with per-attempt entries. addTask accepts an array of UsageData. Centralized logic.
11
+
12
+ ## Should the grand total summary also show per-attempt breakdown?
13
+ Grand total only — the final summary shows combined totals. Per-attempt detail is available in individual task summaries above.
14
+
15
+ ## What format for the model name in log messages?
16
+ "...with sonnet" style — append 'with <model>' before the ellipsis, e.g., "Generating project name suggestions with sonnet..."
17
+
18
+ ## Should the model name be the short alias or full model ID?
19
+ Short alias — display friendly names like 'sonnet', 'haiku', 'opus'. Cleaner output.
20
+
21
+ ## Should model-in-log apply only to name generation or all Claude calls?
22
+ All Claude calls — add model names to all log messages where RAF invokes Claude (name generation, failure analysis, PR generation, config session).
23
+
24
+ ## When config is invalid, should `raf config` silently fall back or warn?
25
+ Warn then continue — show a warning about the invalid config, then launch the interactive session normally with defaults.
26
+
27
+ ## Should config resilience apply to all commands or only `raf config`?
28
+ Only `raf config` — it's the recovery tool. Other commands can still fail fast on invalid config.
29
+
30
+ ## When verbose is ON, should the task name and elapsed time be shown as a header?
31
+ No header at all — when verbose is ON, only show Claude's raw output and tool descriptions. No task name or timer.
32
+
33
+ ## When toggling back to verbose OFF, should the timer resume or reset?
34
+ Resume counting — timer continues from actual elapsed time since task start.
35
+
36
+ ## When verbose is ON, should tool use descriptions still be shown?
37
+ Show both — show Claude's text AND tool use descriptions (→ Reading file.ts, → Running: npm test, etc.).
@@ -0,0 +1,20 @@
1
+ - [ ] **Accumulate token usage across retry attempts** When a task retries, this assignment overwrites prior `usageData`, and the tracker is only updated once after the retry loop, so tokens/cost from earlier failed attempts are dropped. In any task that takes multiple attempts, the per-task and total summaries underreport actual consumption, which skews cost reporting for long or flaky runs.
2
+
3
+ ---
4
+
5
+ when i switch to verbose mode is see output together with timer and task name repeating on each line. could you remove interactive timer when verbose mode is on, and put it back on OFF. and don't put task on each line when in V ON mode. see log: ```● 01-extend-token-tracker-data-model 34s  [verbose: on]
6
+ ● 01-extend-token-tracker-data-model 37s  → Updating task list
7
+
8
+ ● 01-extend-token-tracker-data-model 39sNow let me add the `accumulateUsage()` function. I'll add it before the TokenTracker class.
9
+
10
+ ● 01-extend-token-tracker-data-model 46s  → Editing /Users/eremeev/.raf/worktrees/RAF/ahtahs-token-reaper/src/utils/token-tracker.ts
11
+
12
+ ● 01-extend-token-tracker-data-model 50s  → Updating task list
13
+
14
+ ● 01-extend-token-tracker-data-model 52sNow let me update the `addTask()` method to accept an array.
15
+
16
+ ● 01-extend-token-tracker-data-model 53s  → Reading /Users/eremeev/.raf/worktrees/RAF/ahtahs-token-reaper/src/utils/token-tracker.ts
17
+
18
+ ● 01-extend-token-tracker-data-model 55sNow let me update the `addTask()` method to accept an array of UsageData.
19
+
20
+ ● 01-extend-token-tracker-data-model 56s  [verbose: off]```
@@ -0,0 +1,42 @@
1
+ # Task 01: Extend TokenTracker Data Model
2
+
3
+ ## Summary
4
+
5
+ Refactored TokenTracker to accept and store per-attempt UsageData entries per task, enabling accurate token tracking across retries.
6
+
7
+ ## Changes Made
8
+
9
+ ### src/utils/token-tracker.ts
10
+ - Added `attempts: UsageData[]` field to `TaskUsageEntry` interface
11
+ - Created `accumulateUsage()` utility function that merges multiple UsageData objects into one, summing all token fields and merging modelUsage maps (handles different models across attempts)
12
+ - Updated `addTask()` signature to accept `UsageData[]` instead of single `UsageData`
13
+ - `addTask()` now calls `accumulateUsage()` to compute combined usage and stores raw attempts for future display breakdowns
14
+
15
+ ### src/commands/do.ts
16
+ - Updated two call sites to wrap single `lastUsageData` in array `[lastUsageData]`
17
+ - Added TODO comments indicating these should pass all attempt data once retry loop accumulates them
18
+
19
+ ### tests/unit/token-tracker.test.ts
20
+ - Updated all existing test calls to use array syntax `[usage]`
21
+ - Added new tests for:
22
+ - `accumulateUsage()` function (empty array, single element, multi-element, multi-model merging, non-mutation)
23
+ - Multi-attempt accumulation in `addTask()`
24
+ - Cost calculation for multi-model retry scenarios
25
+ - `attempts` array storage in entries
26
+
27
+ ## Acceptance Criteria Verification
28
+
29
+ - [x] `TaskUsageEntry` has an `attempts: UsageData[]` field
30
+ - [x] `addTask()` accepts an array and correctly accumulates tokens across attempts
31
+ - [x] `accumulateUsage()` correctly sums all token fields including per-model breakdowns
32
+ - [x] `getTotals()` returns correct grand totals when tasks have multiple attempts
33
+ - [x] Single-attempt tasks behave identically to before
34
+ - [x] All existing and new token-tracker tests pass (27 tests)
35
+
36
+ ## Notes
37
+
38
+ - The `accumulateUsage()` function handles the case where different attempts use different models (e.g., Opus on first attempt, Sonnet on retry due to fallback)
39
+ - `calculateCost()` was left unchanged as designed - it operates on the accumulated UsageData
40
+ - Pre-existing test failures in validation.test.ts and claude-runner-interactive.test.ts are unrelated to this task
41
+
42
+ <promise>COMPLETE</promise>
@@ -0,0 +1,31 @@
1
+ # Task 02: Accumulate Usage in Retry Loop
2
+
3
+ ## Summary
4
+
5
+ Modified the retry loop in `do.ts` to collect usage data from every attempt instead of overwriting it, and pass the full array to TokenTracker for accurate token tracking across retries.
6
+
7
+ ## Changes Made
8
+
9
+ ### src/commands/do.ts
10
+ - Replaced `let lastUsageData: UsageData | undefined` with `const attemptUsageData: UsageData[] = []`
11
+ - Changed from overwriting `lastUsageData = result.usageData` to `attemptUsageData.push(result.usageData)` when usage data is present
12
+ - Updated success path (lines ~1091-1095): now checks `attemptUsageData.length > 0` and passes the full array to `tokenTracker.addTask()`
13
+ - Updated failure path (lines ~1118-1122): same change, passes full array for partial data tracking
14
+ - Removed TODO comments that were added in Task 01 as placeholders
15
+
16
+ ## Acceptance Criteria Verification
17
+
18
+ - [x] Usage data from all retry attempts is collected in an array
19
+ - [x] The full array is passed to `tokenTracker.addTask()`
20
+ - [x] Attempts with no usage data (timeout/crash) are excluded from the array (only push when `result.usageData` is defined)
21
+ - [x] Single-attempt tasks still work correctly (array of length 1)
22
+ - [x] All tests pass (token-tracker: 27 tests, do-*: 44 tests)
23
+
24
+ ## Notes
25
+
26
+ - The `lastOutput` variable remains unchanged as designed - only final output matters for result parsing
27
+ - The existing tests from Task 01 already cover the accumulation logic in `TokenTracker` and `accumulateUsage()`
28
+ - The change is minimal and surgical - only the usage data collection mechanism was updated
29
+ - Edge cases (timeouts, crashes, context overflow) correctly result in no usage data being pushed for that attempt
30
+
31
+ <promise>COMPLETE</promise>
@@ -0,0 +1,60 @@
1
+ # Task 03: Per-Attempt Display Formatting
2
+
3
+ ## Summary
4
+
5
+ Updated `formatTaskTokenSummary()` to display a per-attempt breakdown when a task took multiple attempts, while keeping single-attempt output unchanged.
6
+
7
+ ## Changes Made
8
+
9
+ ### src/utils/terminal-symbols.ts
10
+ - Added import for `TaskUsageEntry` type from token-tracker
11
+ - Created internal `formatTokenLine()` helper function that formats a single line of token usage (used for both attempts and totals)
12
+ - Updated `formatTaskTokenSummary()` signature to accept:
13
+ - `entry: TaskUsageEntry` (replaces separate `usage` and `cost` parameters)
14
+ - `calculateAttemptCost?: (usage: UsageData) => CostBreakdown` (optional callback for per-attempt cost calculation)
15
+ - Single-attempt behavior: When `entry.attempts.length <= 1`, output is identical to previous format: `" Tokens: X in / Y out | Cache: ... | Est. cost: $X.XX"`
16
+ - Multi-attempt behavior: Shows per-attempt breakdown with:
17
+ - Each attempt on its own line: `" Attempt N: X in / Y out | Cache: ... | Est. cost: $X.XX"`
18
+ - Total line at the end: `" Total: X in / Y out | Cache: ... | Est. cost: $X.XX"`
19
+
20
+ ### src/commands/do.ts
21
+ - Updated both call sites (success and failure paths) to pass the full `TaskUsageEntry` and the `calculateCost` callback:
22
+ - `logger.dim(formatTaskTokenSummary(entry, (u) => tokenTracker.calculateCost(u)))`
23
+
24
+ ### tests/unit/terminal-symbols.test.ts
25
+ - Added import for `TaskUsageEntry` type
26
+ - Created `makeEntry()` helper to construct `TaskUsageEntry` objects for testing
27
+ - Reorganized `formatTaskTokenSummary` tests into two describe blocks:
28
+ - `single-attempt tasks`: 6 tests verifying unchanged behavior for single-attempt scenarios
29
+ - `multi-attempt tasks`: 4 tests covering multi-attempt formatting, cost calculation, cache tokens, and 3+ attempts
30
+
31
+ ## Example Output
32
+
33
+ **Single-attempt (unchanged):**
34
+ ```
35
+ Tokens: 5,234 in / 1,023 out | Cache: 18,500 read | Est. cost: $0.42
36
+ ```
37
+
38
+ **Multi-attempt (new):**
39
+ ```
40
+ Attempt 1: 1,234 in / 567 out | Est. cost: $0.02
41
+ Attempt 2: 2,345 in / 890 out | Est. cost: $0.04
42
+ Total: 3,579 in / 1,457 out | Est. cost: $0.06
43
+ ```
44
+
45
+ ## Acceptance Criteria Verification
46
+
47
+ - [x] Single-attempt tasks display identically to current format
48
+ - [x] Multi-attempt tasks show per-attempt lines plus a total
49
+ - [x] Formatting is clean and readable in terminal output
50
+ - [x] `formatTokenTotalSummary()` is unchanged
51
+ - [x] All call sites updated
52
+ - [x] All tests pass (135 tests including 10 new tests for this feature)
53
+
54
+ ## Notes
55
+
56
+ - The `calculateAttemptCost` callback is optional; when not provided, per-attempt costs show `$0.00` (the total still shows accurate accumulated cost)
57
+ - Per-attempt lines use 4-space indent to visually nest under the task, while single-attempt uses 2-space indent
58
+ - Cache tokens are included in per-attempt breakdowns when present
59
+
60
+ <promise>COMPLETE</promise>
@@ -0,0 +1,57 @@
1
+ # Task 04: Add Model Name to Claude Invocation Logs
2
+
3
+ ## Summary
4
+
5
+ Added a `getModelShortName()` utility function and updated all four Claude invocation log messages to display the short model alias (e.g., "sonnet", "haiku", "opus").
6
+
7
+ ## Changes Made
8
+
9
+ ### src/utils/config.ts
10
+ - Added `getModelShortName(modelId: string)` utility function that:
11
+ - Returns short aliases (`opus`, `sonnet`, `haiku`) as-is
12
+ - Extracts family from full model IDs (e.g., `claude-sonnet-4-5-20250929` → `sonnet`)
13
+ - Returns unknown model IDs as-is for graceful fallback
14
+
15
+ ### src/commands/plan.ts
16
+ - Added import for `getModel` and `getModelShortName`
17
+ - Updated name generation log: `"Generating project name suggestions with ${nameModel}..."`
18
+
19
+ ### src/commands/do.ts
20
+ - Added import for `getModel` and `getModelShortName`
21
+ - Updated failure analysis log: `"Analyzing failure with ${analysisModel}..."`
22
+
23
+ ### src/core/pull-request.ts
24
+ - Added import for `getModelShortName`
25
+ - Added new log message in `generatePrBody()`: `"Generating PR with ${prModel}..."`
26
+
27
+ ### src/commands/config.ts
28
+ - Added import for `getModelShortName`
29
+ - Consolidated two log lines into one: `"Starting config session with ${configModel}..."`
30
+ - Previously: "Starting config session with Claude..." + "Using model: ${model}"
31
+ - Now: single line with short model name
32
+
33
+ ### tests/unit/config.test.ts
34
+ - Added import for `getModelShortName`
35
+ - Added test suite with 3 test cases:
36
+ - `should return short aliases as-is`
37
+ - `should extract family from full model IDs`
38
+ - `should return unknown model IDs as-is`
39
+
40
+ ## Acceptance Criteria Verification
41
+
42
+ - [x] All four Claude invocation points show the model short name in their log messages
43
+ - Name generation: `"Generating project name suggestions with sonnet..."`
44
+ - Failure analysis: `"Analyzing failure with haiku..."`
45
+ - PR generation: `"Generating PR with sonnet..."`
46
+ - Config session: `"Starting config session with sonnet..."`
47
+ - [x] Short name extraction works for full model IDs and already-short names
48
+ - [x] Log format follows the "...with <model>..." pattern
49
+ - [x] Unit tests cover the short name utility (3 tests)
50
+ - [x] All tests pass (95 config tests, 1156 total passing)
51
+
52
+ ## Notes
53
+
54
+ - Pre-existing test failures in `validation.test.ts` and `claude-runner-interactive.test.ts` are unrelated to this task
55
+ - The `getModelShortName()` function reuses logic similar to `resolveModelPricingCategory()` but returns the original string for unknown models instead of `null`
56
+
57
+ <promise>COMPLETE</promise>
@@ -0,0 +1,46 @@
1
+ # Task 05: Handle Invalid Config Gracefully in raf config Command
2
+
3
+ ## Summary
4
+
5
+ Made `raf config` resilient to invalid or corrupt config files so it can serve as the recovery path for broken configurations. Previously, if `~/.raf/raf.config.json` contained invalid JSON or failed schema validation, `raf config` would crash before the interactive session could launch, blocking users from fixing the issue.
6
+
7
+ ## Changes Made
8
+
9
+ ### src/commands/config.ts
10
+ - Added import for `resetConfigCache` from config utilities
11
+ - Added import for `DEFAULT_CONFIG` from types/config
12
+ - Wrapped `getModel('config')` and `getEffort('config')` calls in try-catch block
13
+ - On error, falls back to `DEFAULT_CONFIG.models.config` ('sonnet') and `DEFAULT_CONFIG.effort.config` ('medium')
14
+ - Displays warning message with the specific error: "Config file has errors, using defaults: {message}"
15
+ - Provides guidance: "Fix the config in this session or run `raf config --reset` to start fresh."
16
+ - Calls `resetConfigCache()` to clear any broken cached config
17
+ - The interactive Claude session still receives the broken config file contents via `getCurrentConfigState()`, so the user can see and fix the issue
18
+
19
+ ### tests/unit/config-command.test.ts
20
+ - Added imports for `resolveConfig`, `getModel`, `getEffort`, `resetConfigCache`, and `DEFAULT_CONFIG`
21
+ - Added `resetConfigCache()` calls to beforeEach/afterEach for test isolation
22
+ - Added new test suite "Error recovery - invalid config fallback" with 6 tests:
23
+ - Throws on invalid JSON when resolving config
24
+ - Throws on schema validation failure when resolving config
25
+ - Default fallback values are correct for config scenario
26
+ - Raw file contents readable even with invalid JSON
27
+ - Raw file contents readable even with schema validation failure
28
+ - resetConfigCache clears the cached config
29
+
30
+ ## Acceptance Criteria Verification
31
+
32
+ - [x] `raf config` launches successfully even when `~/.raf/raf.config.json` is invalid JSON
33
+ - [x] `raf config` launches successfully even when config fails schema validation
34
+ - [x] A clear warning is displayed to the user about the config error
35
+ - [x] The interactive session uses default model/effort values as fallback
36
+ - [x] The broken config content is still visible in the session for the user to fix
37
+ - [x] Other commands (`raf plan`, `raf do`, `raf status`) still fail fast on invalid config
38
+ - [x] All tests pass (121 config-related tests, 1162 total passing; pre-existing failures in validation.test.ts and claude-runner-interactive.test.ts are unrelated)
39
+
40
+ ## Notes
41
+
42
+ - The error handling is specific to `raf config` - other commands continue to fail fast on invalid config as expected
43
+ - The `getCurrentConfigState()` function reads raw file content without JSON parsing, so broken content is always available to Claude in the session
44
+ - The `--reset` option doesn't need this fix since it deletes the file without loading it
45
+
46
+ <promise>COMPLETE</promise>
@@ -0,0 +1,38 @@
1
+ # Task 06: Fix Verbose Toggle Timer Display
2
+
3
+ ## Summary
4
+
5
+ Modified the timer callback in `do.ts` to check the verbose toggle state on each tick. When verbose is toggled ON at runtime, the status line is immediately cleared and updates are skipped. When toggled back OFF, the timer resumes displaying with the accurate elapsed time.
6
+
7
+ ## Changes Made
8
+
9
+ ### src/commands/do.ts
10
+ - Updated the `onTick` callback (lines 915-923) to check `verboseToggle.isVerbose` on every tick
11
+ - When verbose is ON: calls `statusLine.clear()` and returns early (skipping the update)
12
+ - When verbose is OFF: updates the status line as normal with task progress
13
+ - The timer continues tracking elapsed time internally regardless of display state
14
+
15
+ ### tests/unit/timer-verbose-integration.test.ts (new file)
16
+ - Created new test file with 5 tests covering the timer-verbose integration:
17
+ - `should update status line when verbose is off`
18
+ - `should clear status line and skip update when verbose is toggled on`
19
+ - `should resume updating status line when verbose is toggled back off`
20
+ - `should track elapsed time correctly regardless of verbose state`
21
+ - `should not create timer callback when started with verbose flag`
22
+
23
+ ## Acceptance Criteria Verification
24
+
25
+ - [x] Toggling verbose ON clears the status line and stops timer/task-name display
26
+ - [x] Toggling verbose OFF resumes the timer/status line with correct elapsed time
27
+ - [x] No task name prefix appears on verbose output lines (status line cleared immediately)
28
+ - [x] Starting with `--verbose` flag still works as before (no timer callback created)
29
+ - [x] Timer internally tracks elapsed time correctly regardless of display state
30
+ - [x] All existing tests pass (1167 passing; 3 pre-existing failures in validation.test.ts and claude-runner-interactive.test.ts are unrelated)
31
+
32
+ ## Notes
33
+
34
+ - The fix is minimal: just 4 lines added to the existing `onTick` callback
35
+ - The `statusLine.clear()` call happens on every tick while verbose is on, which is safe because the clear operation is idempotent
36
+ - The next tick after toggling verbose OFF will immediately show the correct elapsed time since the timer tracks time independently
37
+
38
+ <promise>COMPLETE</promise>
@@ -0,0 +1,36 @@
1
+ # Task: Extend TokenTracker to store per-attempt usage data
2
+
3
+ ## Objective
4
+ Refactor TokenTracker to accept and store an array of per-attempt UsageData entries per task, instead of a single UsageData.
5
+
6
+ ## Context
7
+ Currently TokenTracker stores one `UsageData` per task via `addTask(taskId, usage)`. When a task retries, only the last attempt's data reaches the tracker. To fix underreporting, the tracker needs to accept multiple attempt entries per task and compute totals from all of them.
8
+
9
+ ## Requirements
10
+ - Change `TaskUsageEntry` to hold an array of attempt `UsageData` entries alongside the aggregated totals
11
+ - Update `addTask()` to accept an array of `UsageData` (one per attempt) instead of a single `UsageData`
12
+ - The per-entry `usage` field should be the sum of all attempts (for backward compatibility with `getTotals()`)
13
+ - The per-entry `cost` field should be the sum of all attempts' costs
14
+ - Store the raw per-attempt data so formatting functions can display breakdowns
15
+ - `getTotals()` should continue to work correctly — it already sums across entries, so as long as each entry's `usage` is the accumulated total, no changes needed there
16
+ - Add a helper method or utility to merge/accumulate multiple `UsageData` objects into one
17
+ - Maintain backward compatibility: if only one attempt occurred, behavior is identical to today
18
+ - Cover changes with unit tests
19
+
20
+ ## Implementation Steps
21
+ 1. Add an `attempts` field to `TaskUsageEntry` that stores the array of individual `UsageData` objects
22
+ 2. Create an `accumulateUsage()` utility that merges multiple `UsageData` into a single combined `UsageData` (summing all token fields and merging `modelUsage` maps)
23
+ 3. Update `addTask()` signature to accept `UsageData[]` — it calls `accumulateUsage()` to compute the combined `usage` and `calculateCost()` on the combined result
24
+ 4. Update existing tests and add new tests for multi-attempt accumulation
25
+
26
+ ## Acceptance Criteria
27
+ - [ ] `TaskUsageEntry` has an `attempts: UsageData[]` field
28
+ - [ ] `addTask()` accepts an array and correctly accumulates tokens across attempts
29
+ - [ ] `accumulateUsage()` correctly sums all token fields including per-model breakdowns
30
+ - [ ] `getTotals()` returns correct grand totals when tasks have multiple attempts
31
+ - [ ] Single-attempt tasks behave identically to before
32
+ - [ ] All existing and new tests pass
33
+
34
+ ## Notes
35
+ - The `accumulateUsage()` helper should handle merging `modelUsage` maps where different attempts may use different models (e.g., attempt 1 uses Opus, retry uses Sonnet via fallback)
36
+ - Keep `calculateCost()` unchanged — it operates on a single `UsageData` which is the accumulated total
@@ -0,0 +1,36 @@
1
+ # Task: Accumulate usage data across retry attempts in the retry loop
2
+
3
+ ## Objective
4
+ Change the retry loop in `do.ts` to collect usage data from every attempt instead of overwriting it, and pass the full array to TokenTracker.
5
+
6
+ ## Context
7
+ The retry loop in `src/commands/do.ts` (around line 908-1021) currently declares a single `lastUsageData` variable that gets overwritten on each retry attempt. After the loop, only the final attempt's data is passed to `tokenTracker.addTask()`. This must change to collect all attempts' data.
8
+
9
+ ## Dependencies
10
+ 01
11
+
12
+ ## Requirements
13
+ - Replace the single `lastUsageData` variable with an array that collects `UsageData` from each attempt
14
+ - Push each attempt's `usageData` into the array (when present) instead of overwriting
15
+ - After the retry loop, pass the full array to `tokenTracker.addTask()` (using the new signature from task 01)
16
+ - Both success and failure paths (lines ~1090 and ~1117) should pass the array
17
+ - Handle edge case: some attempts may not produce `usageData` (timeout, crash) — skip those entries
18
+ - Cover changes with tests
19
+
20
+ ## Implementation Steps
21
+ 1. Replace `let lastUsageData: UsageData | undefined` with `const attemptUsageData: UsageData[] = []`
22
+ 2. Inside the retry loop, change the overwrite (`lastUsageData = result.usageData`) to a push (`attemptUsageData.push(result.usageData)`) when `result.usageData` is defined
23
+ 3. Update the success path: call `tokenTracker.addTask(task.id, attemptUsageData)` when the array is non-empty
24
+ 4. Update the failure path: same change
25
+ 5. Add/update tests to verify accumulation across retries
26
+
27
+ ## Acceptance Criteria
28
+ - [ ] Usage data from all retry attempts is collected in an array
29
+ - [ ] The full array is passed to `tokenTracker.addTask()`
30
+ - [ ] Attempts with no usage data (timeout/crash) are excluded from the array
31
+ - [ ] Single-attempt tasks still work correctly (array of length 1)
32
+ - [ ] All tests pass
33
+
34
+ ## Notes
35
+ - The variable `lastOutput` should remain as-is (overwritten each attempt) since only the final output matters for result parsing
36
+ - Look at the `result.output` fallback path (line 971-974) — the old code had a fallback where `lastUsageData = result.output` which seems like a type issue; clean this up if it's not needed
@@ -0,0 +1,43 @@
1
+ # Task: Update token summary formatting to show per-attempt breakdowns
2
+
3
+ ## Objective
4
+ Update `formatTaskTokenSummary()` to display a per-attempt breakdown when a task took multiple attempts, while keeping single-attempt output unchanged.
5
+
6
+ ## Context
7
+ With tasks 01 and 02 complete, the `TaskUsageEntry` now contains an `attempts` array with per-attempt `UsageData`. The formatting function needs to render this breakdown so users can see token consumption per retry attempt.
8
+
9
+ ## Dependencies
10
+ 01, 02
11
+
12
+ ## Requirements
13
+ - When a task has only 1 attempt, output is identical to the current format (no visual change)
14
+ - When a task has multiple attempts, show each attempt's tokens and cost on its own line, followed by a total line
15
+ - Always show the breakdown (not gated by --verbose flag)
16
+ - The grand total summary (`formatTokenTotalSummary`) remains unchanged — it shows combined totals only
17
+ - Update `formatTaskTokenSummary()` signature to accept the `attempts` array from `TaskUsageEntry`
18
+ - Cover changes with unit tests
19
+
20
+ ## Implementation Steps
21
+ 1. Update `formatTaskTokenSummary()` to accept the full `TaskUsageEntry` (or at minimum `usage`, `cost`, and `attempts`)
22
+ 2. For single-attempt tasks (array length 1), render the existing format unchanged
23
+ 3. For multi-attempt tasks, render each attempt on its own indented line with attempt number, tokens, and cost, then a total line
24
+ 4. Update all call sites that invoke `formatTaskTokenSummary()` to pass the attempts data
25
+ 5. Add tests for both single-attempt and multi-attempt formatting
26
+
27
+ ## Acceptance Criteria
28
+ - [ ] Single-attempt tasks display identically to current format
29
+ - [ ] Multi-attempt tasks show per-attempt lines plus a total
30
+ - [ ] Formatting is clean and readable in terminal output
31
+ - [ ] `formatTokenTotalSummary()` is unchanged
32
+ - [ ] All call sites updated
33
+ - [ ] All tests pass
34
+
35
+ ## Notes
36
+ - Example multi-attempt output (approximate):
37
+ ```
38
+ Attempt 1: 1,234 in / 567 out | Est. cost: $0.02
39
+ Attempt 2: 2,345 in / 890 out | Est. cost: $0.04
40
+ Total: 3,579 in / 1,457 out | Est. cost: $0.06
41
+ ```
42
+ - Keep the dim styling consistent with existing token output
43
+ - The TokenTracker's `calculateCost()` can be used to get per-attempt costs if needed
@@ -0,0 +1,38 @@
1
+ # Task: Add model name to all Claude invocation log messages
2
+
3
+ ## Objective
4
+ Display the short model alias (e.g., "sonnet", "haiku") in all log messages where RAF invokes Claude for non-task purposes.
5
+
6
+ ## Context
7
+ When RAF calls Claude for auxiliary tasks (name generation, failure analysis, PR generation, config), it logs a message but doesn't indicate which model is being used. Users want visibility into which model is handling each call, especially since models are configurable per scenario.
8
+
9
+ ## Requirements
10
+ - Use the format "...with <model>..." — append the model name before the trailing ellipsis
11
+ - Display the short alias (sonnet, haiku, opus) not the full model ID
12
+ - Apply to all four Claude invocation log messages:
13
+ 1. Name generation: `src/commands/plan.ts` line 158 — currently "Generating project name suggestions..."
14
+ 2. Failure analysis: `src/commands/do.ts` line 1111 — currently "Analyzing failure..."
15
+ 3. PR generation: `src/core/pull-request.ts` — currently no explicit log message (add one)
16
+ 4. Config session: `src/commands/config.ts` line 184 — currently "Starting config session with Claude..."
17
+ - Create a utility to extract the short alias from a full model ID string (e.g., "claude-sonnet-4-5-20250929" → "sonnet")
18
+ - The model value comes from `getModel()` calls in each module — the short name should be derived from whatever that returns
19
+
20
+ ## Implementation Steps
21
+ 1. Add a `getModelShortName(modelId: string)` utility that extracts the short alias from a model ID string — handle both full IDs ("claude-sonnet-4-5-20250929") and already-short names ("sonnet")
22
+ 2. Update the name generation log in `src/commands/plan.ts` to include the model: "Generating project name suggestions with sonnet..."
23
+ 3. Update the failure analysis log in `src/commands/do.ts` to include the model: "Analyzing failure with haiku..."
24
+ 4. Add a log message for PR generation in `src/core/pull-request.ts`: "Generating PR with haiku..."
25
+ 5. Update the config session log in `src/commands/config.ts` to include the model: "Starting config session with sonnet..."
26
+ 6. Cover the `getModelShortName()` utility with unit tests
27
+
28
+ ## Acceptance Criteria
29
+ - [ ] All four Claude invocation points show the model short name in their log messages
30
+ - [ ] Short name extraction works for full model IDs and already-short names
31
+ - [ ] Log format follows the "...with <model>..." pattern
32
+ - [ ] Unit tests cover the short name utility
33
+ - [ ] All tests pass
34
+
35
+ ## Notes
36
+ - The model for each scenario is retrieved via `getModel('nameGeneration')`, `getModel('failureAnalysis')`, `getModel('prGeneration')`, `getModel('config')` from `src/utils/config.ts`
37
+ - Some call sites may need to retrieve the model earlier or pass it around to have it available at the log point — for instance, name generation logs in `plan.ts` but the model is determined inside `name-generator.ts`
38
+ - For the config session, there's already a line showing the model — consolidate if appropriate
@@ -0,0 +1,36 @@
1
+ # Task: Handle invalid config gracefully in raf config command
2
+
3
+ ## Objective
4
+ Make `raf config` resilient to invalid or corrupt config files so it can serve as the recovery path for broken configurations.
5
+
6
+ ## Context
7
+ In `src/commands/config.ts`, the command calls `getModel('config')` and `getEffort('config')` early in execution. These read from the resolved config, which requires loading and validating `~/.raf/raf.config.json`. If that file contains invalid JSON or fails schema validation, these calls throw and `raf config` exits immediately — blocking the user from using the interactive editor to fix their config. Since `raf config` is the intended way to edit config, it must survive a broken config file.
8
+
9
+ ## Requirements
10
+ - Wrap the config-loading path in `raf config` with error handling that catches JSON parse errors and schema validation failures
11
+ - On error, warn the user with a visible message (e.g., "Config file has errors, using defaults") that includes the specific error
12
+ - Fall back to default config values for model and effort so the interactive session can launch
13
+ - The interactive Claude session should still receive the current (broken) config file contents as context, so the user can see and fix the issue
14
+ - Only apply this resilience to `raf config` — other commands should continue to fail fast on invalid config
15
+ - Cover the error-handling path with tests
16
+
17
+ ## Implementation Steps
18
+ 1. In `src/commands/config.ts`, wrap the `getModel('config')` and `getEffort('config')` calls in a try-catch
19
+ 2. On catch, log a warning with the error details and fall back to the default model/effort values from `DEFAULT_CONFIG`
20
+ 3. Ensure the rest of the command continues normally — the interactive session launches with defaults
21
+ 4. Make sure the broken config file contents are still shown to Claude in the session prompt so the user can diagnose and fix
22
+ 5. Add tests for the error-recovery path (invalid JSON, schema validation failure)
23
+
24
+ ## Acceptance Criteria
25
+ - [ ] `raf config` launches successfully even when `~/.raf/raf.config.json` is invalid JSON
26
+ - [ ] `raf config` launches successfully even when config fails schema validation
27
+ - [ ] A clear warning is displayed to the user about the config error
28
+ - [ ] The interactive session uses default model/effort values as fallback
29
+ - [ ] The broken config content is still visible in the session for the user to fix
30
+ - [ ] Other commands (`raf plan`, `raf do`, `raf status`) still fail fast on invalid config
31
+ - [ ] All tests pass
32
+
33
+ ## Notes
34
+ - Check whether `loadConfig()` or the individual `getModel()`/`getEffort()` accessors are the right place to catch — it may be cleaner to catch at the `loadConfig()` level and return defaults
35
+ - The post-session validation already checks for config errors after the session ends — this change handles the pre-session path
36
+ - Consider whether `raf config --reset` also needs this fix (it probably doesn't since reset deletes the file without loading it)
@@ -0,0 +1,40 @@
1
+ # Task: Fix Verbose Toggle Timer Display
2
+
3
+ ## Objective
4
+ Stop the interactive timer and task-name prefix from displaying when verbose mode is toggled ON, and resume them when toggled OFF.
5
+
6
+ ## Context
7
+ When a user presses Tab during task execution to toggle verbose ON, the timer/status line continues updating and gets interleaved with Claude's streamed output. This produces garbled lines like:
8
+ ```
9
+ ● 01-extend-token-tracker-data-model 39sNow let me add the accumulateUsage() function.
10
+ ```
11
+ The timer callback and status line need to be aware of the verbose toggle state so they pause/clear when verbose is ON and resume when verbose is OFF.
12
+
13
+ ## Requirements
14
+ - When verbose toggles ON mid-execution: immediately clear the status line and stop timer display updates
15
+ - When verbose toggles OFF mid-execution: resume the timer/status line from the actual elapsed time (no reset)
16
+ - When started with `--verbose` flag: current behavior is already correct (no timer callback) — preserve this
17
+ - No task name or timer shown at all while verbose output is streaming — no header line either
18
+ - Tool use descriptions (→ Reading file.ts) and Claude text output continue to display normally when verbose is ON
19
+ - The timer itself keeps counting internally regardless of display state (elapsed time stays accurate)
20
+
21
+ ## Implementation Steps
22
+ 1. Read the current timer callback setup in `do.ts` where `createTaskTimer` is called — understand how the `onTick` callback currently works
23
+ 2. Read `verbose-toggle.ts` to understand the toggle mechanism and its `isVerbose` property
24
+ 3. Modify the timer's `onTick` callback to check `verboseToggle.isVerbose` on each tick — if verbose is ON, clear the status line and skip the update; if OFF, render the status line as normal
25
+ 4. Ensure the status line is cleared immediately when verbose toggles ON (so the last timer line doesn't linger above the verbose output). This may require hooking into the toggle event or simply having the next tick handle it
26
+ 5. Verify that toggling OFF restores the status line with the correct elapsed time on the next tick
27
+ 6. Add tests for the new behavior: timer callback respects verbose state, status line cleared on verbose ON, resumed on verbose OFF
28
+
29
+ ## Acceptance Criteria
30
+ - [ ] Toggling verbose ON clears the status line and stops timer/task-name display
31
+ - [ ] Toggling verbose OFF resumes the timer/status line with correct elapsed time
32
+ - [ ] No task name prefix appears on verbose output lines
33
+ - [ ] Starting with `--verbose` flag still works as before (no timer at all)
34
+ - [ ] Timer internally tracks elapsed time correctly regardless of display state
35
+ - [ ] All existing tests pass
36
+
37
+ ## Notes
38
+ - The key files are `src/commands/do.ts` (timer callback setup around line 914), `src/utils/status-line.ts`, `src/utils/timer.ts`, and `src/utils/verbose-toggle.ts`
39
+ - The fix is likely a small change to the `onTick` callback — check `verboseToggle.isVerbose` and conditionally clear/update the status line
40
+ - Be careful with the edge case where `verbose` is the initial flag (no toggle exists) vs. runtime toggle via Tab
@@ -1 +1 @@
1
- {"version":3,"file":"config.d.ts","sourceRoot":"","sources":["../../src/commands/config.ts"],"names":[],"mappings":"AAIA,OAAO,EAAE,OAAO,EAAE,MAAM,WAAW,CAAC;AA8GpC,wBAAgB,mBAAmB,IAAI,OAAO,CAgB7C"}
1
+ {"version":3,"file":"config.d.ts","sourceRoot":"","sources":["../../src/commands/config.ts"],"names":[],"mappings":"AAIA,OAAO,EAAE,OAAO,EAAE,MAAM,WAAW,CAAC;AAiHpC,wBAAgB,mBAAmB,IAAI,OAAO,CAgB7C"}
@@ -6,7 +6,8 @@ import { Command } from 'commander';
6
6
  import { ClaudeRunner } from '../core/claude-runner.js';
7
7
  import { shutdownHandler } from '../core/shutdown-handler.js';
8
8
  import { logger } from '../utils/logger.js';
9
- import { getConfigPath, getModel, getEffort, validateConfig, ConfigValidationError, } from '../utils/config.js';
9
+ import { getConfigPath, getModel, getEffort, getModelShortName, validateConfig, ConfigValidationError, resetConfigCache, } from '../utils/config.js';
10
+ import { DEFAULT_CONFIG } from '../types/config.js';
10
11
  /**
11
12
  * Load the config documentation markdown from src/prompts/config-docs.md.
12
13
  * Resolved relative to this file's location in the dist/ tree.
@@ -128,8 +129,29 @@ async function handleReset() {
128
129
  }
129
130
  async function runConfigSession(initialPrompt) {
130
131
  const configPath = getConfigPath();
131
- const model = getModel('config');
132
- const effort = getEffort('config');
132
+ // Try to load config, but fall back to defaults if it's broken
133
+ // This allows raf config to be used to fix a broken config file
134
+ let model;
135
+ let effort;
136
+ let configError = null;
137
+ try {
138
+ model = getModel('config');
139
+ effort = getEffort('config');
140
+ }
141
+ catch (error) {
142
+ // Config file has errors - fall back to defaults so the session can launch
143
+ configError = error instanceof Error ? error : new Error(String(error));
144
+ model = DEFAULT_CONFIG.models.config;
145
+ effort = DEFAULT_CONFIG.effort.config;
146
+ // Clear the cached config so subsequent calls don't use the broken cache
147
+ resetConfigCache();
148
+ }
149
+ // Warn user if config has errors, before starting the session
150
+ if (configError) {
151
+ logger.warn(`Config file has errors, using defaults: ${configError.message}`);
152
+ logger.warn('Fix the config in this session or run `raf config --reset` to start fresh.');
153
+ logger.newline();
154
+ }
133
155
  // Set effort level env var for the Claude session
134
156
  process.env['CLAUDE_CODE_EFFORT_LEVEL'] = effort;
135
157
  // Load config docs
@@ -151,8 +173,8 @@ async function runConfigSession(initialPrompt) {
151
173
  const claudeRunner = new ClaudeRunner({ model });
152
174
  shutdownHandler.init();
153
175
  shutdownHandler.registerClaudeRunner(claudeRunner);
154
- logger.info('Starting config session with Claude...');
155
- logger.info(`Using model: ${model}`);
176
+ const configModel = getModelShortName(model);
177
+ logger.info(`Starting config session with ${configModel}...`);
156
178
  logger.newline();
157
179
  try {
158
180
  const exitCode = await claudeRunner.runInteractive(systemPrompt, userMessage, {