npm - rafcode - Versions diffs - 2.3.0 → 2.4.0 - Mend

rafcode 2.3.0 → 2.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (109) hide show

package/CLAUDE.md +19 -4
package/RAF/ahvrih-rate-forge/decisions.md +70 -0
package/RAF/ahvrih-rate-forge/input.md +44 -0
package/RAF/ahvrih-rate-forge/outcomes/01-remove-claude-command-config.md +58 -0
package/RAF/ahvrih-rate-forge/outcomes/02-fix-mixed-attempt-cost.md +46 -0
package/RAF/ahvrih-rate-forge/outcomes/03-rate-limit-estimation.md +82 -0
package/RAF/ahvrih-rate-forge/outcomes/04-show-version-in-do-logs.md +45 -0
package/RAF/ahvrih-rate-forge/outcomes/05-sync-main-before-worktree.md +96 -0
package/RAF/ahvrih-rate-forge/outcomes/06-sync-readme-with-codebase.md +45 -0
package/RAF/ahvrih-rate-forge/outcomes/07-no-session-persistence.md +26 -0
package/RAF/ahvrih-rate-forge/outcomes/08-plan-execution-metadata.md +130 -0
package/RAF/ahvrih-rate-forge/plans/01-remove-claude-command-config.md +36 -0
package/RAF/ahvrih-rate-forge/plans/02-fix-mixed-attempt-cost.md +33 -0
package/RAF/ahvrih-rate-forge/plans/03-rate-limit-estimation.md +82 -0
package/RAF/ahvrih-rate-forge/plans/04-show-version-in-do-logs.md +32 -0
package/RAF/ahvrih-rate-forge/plans/05-sync-main-before-worktree.md +40 -0
package/RAF/ahvrih-rate-forge/plans/06-sync-readme-with-codebase.md +61 -0
package/RAF/ahvrih-rate-forge/plans/07-no-session-persistence.md +28 -0
package/RAF/ahvrih-rate-forge/plans/08-plan-execution-metadata.md +123 -0
package/README.md +27 -7
package/dist/commands/config.d.ts.map +1 -1
package/dist/commands/config.js +1 -6
package/dist/commands/config.js.map +1 -1
package/dist/commands/do.d.ts.map +1 -1
package/dist/commands/do.js +106 -18
package/dist/commands/do.js.map +1 -1
package/dist/commands/plan.d.ts.map +1 -1
package/dist/commands/plan.js +77 -2
package/dist/commands/plan.js.map +1 -1
package/dist/core/claude-runner.d.ts +6 -6
package/dist/core/claude-runner.d.ts.map +1 -1
package/dist/core/claude-runner.js +9 -10
package/dist/core/claude-runner.js.map +1 -1
package/dist/core/failure-analyzer.d.ts.map +1 -1
package/dist/core/failure-analyzer.js +3 -3
package/dist/core/failure-analyzer.js.map +1 -1
package/dist/core/pull-request.js +3 -3
package/dist/core/pull-request.js.map +1 -1
package/dist/core/state-derivation.d.ts +5 -0
package/dist/core/state-derivation.d.ts.map +1 -1
package/dist/core/state-derivation.js +14 -4
package/dist/core/state-derivation.js.map +1 -1
package/dist/core/worktree.d.ts +32 -0
package/dist/core/worktree.d.ts.map +1 -1
package/dist/core/worktree.js +215 -0
package/dist/core/worktree.js.map +1 -1
package/dist/prompts/amend.d.ts.map +1 -1
package/dist/prompts/amend.js +26 -11
package/dist/prompts/amend.js.map +1 -1
package/dist/prompts/planning.d.ts.map +1 -1
package/dist/prompts/planning.js +26 -11
package/dist/prompts/planning.js.map +1 -1
package/dist/types/config.d.ts +30 -13
package/dist/types/config.d.ts.map +1 -1
package/dist/types/config.js +14 -10
package/dist/types/config.js.map +1 -1
package/dist/utils/config.d.ts +47 -4
package/dist/utils/config.d.ts.map +1 -1
package/dist/utils/config.js +176 -30
package/dist/utils/config.js.map +1 -1
package/dist/utils/frontmatter.d.ts +43 -0
package/dist/utils/frontmatter.d.ts.map +1 -0
package/dist/utils/frontmatter.js +85 -0
package/dist/utils/frontmatter.js.map +1 -0
package/dist/utils/name-generator.d.ts.map +1 -1
package/dist/utils/name-generator.js +2 -3
package/dist/utils/name-generator.js.map +1 -1
package/dist/utils/session-parser.d.ts +44 -0
package/dist/utils/session-parser.d.ts.map +1 -0
package/dist/utils/session-parser.js +122 -0
package/dist/utils/session-parser.js.map +1 -0
package/dist/utils/terminal-symbols.d.ts +22 -3
package/dist/utils/terminal-symbols.d.ts.map +1 -1
package/dist/utils/terminal-symbols.js +52 -18
package/dist/utils/terminal-symbols.js.map +1 -1
package/dist/utils/token-tracker.d.ts +20 -0
package/dist/utils/token-tracker.d.ts.map +1 -1
package/dist/utils/token-tracker.js +57 -2
package/dist/utils/token-tracker.js.map +1 -1
package/package.json +1 -1
package/src/commands/config.ts +0 -7
package/src/commands/do.ts +141 -20
package/src/commands/plan.ts +87 -1
package/src/core/claude-runner.ts +16 -17
package/src/core/failure-analyzer.ts +3 -3
package/src/core/pull-request.ts +3 -3
package/src/core/state-derivation.ts +20 -4
package/src/core/worktree.ts +230 -0
package/src/prompts/amend.ts +26 -11
package/src/prompts/config-docs.md +91 -29
package/src/prompts/planning.ts +26 -11
package/src/types/config.ts +46 -21
package/src/utils/config.ts +200 -33
package/src/utils/frontmatter.ts +110 -0
package/src/utils/name-generator.ts +2 -3
package/src/utils/session-parser.ts +161 -0
package/src/utils/terminal-symbols.ts +68 -16
package/src/utils/token-tracker.ts +65 -2
package/tests/unit/claude-runner-interactive.test.ts +8 -6
package/tests/unit/claude-runner.test.ts +5 -66
package/tests/unit/config-command.test.ts +6 -6
package/tests/unit/config.test.ts +268 -45
package/tests/unit/frontmatter.test.ts +182 -0
package/tests/unit/post-execution-picker.test.ts +5 -0
package/tests/unit/session-parser.test.ts +301 -0
package/tests/unit/terminal-symbols.test.ts +142 -0
package/tests/unit/token-tracker.test.ts +304 -1
package/tests/unit/validation.test.ts +6 -4
package/tests/unit/worktree.test.ts +242 -0

package/CLAUDE.md CHANGED Viewed

@@ -44,8 +44,10 @@ RAF/
 ### Plan File Structure
-Each plan file follows this structure:
+Each plan file MUST have Obsidian-style frontmatter at the top, before the `# Task:` heading:
 ```markdown
+effort: medium
+---
 # Task: [Task Name]
 ## Objective
@@ -74,6 +76,12 @@ Each plan file follows this structure:
 [Additional context]
 ```
+**Frontmatter**:
+- Uses Obsidian-style format: `key: value` lines followed by `---` (no opening delimiter)
+- `effort` is REQUIRED: `low`, `medium`, or `high` — determines execution model via `effortMapping`
+- `model` is OPTIONAL: explicit model override (subject to ceiling)
+- Frontmatter is parsed by `src/utils/frontmatter.ts`
 **Dependencies Section**:
 - Optional - omit if task has no dependencies
 - Uses task IDs only (e.g., `01, 02`)
@@ -146,18 +154,25 @@ npm run lint       # Type check without emit
 - **Config file**: `~/.raf/raf.config.json` (optional — missing file uses all defaults)
 - **Schema** (defined in `src/types/config.ts`):
   - `models.*` — Claude model per scenario (`execute`, `plan`, `nameGeneration`, `failureAnalysis`, `prGeneration`, `config`)
-  - `effort.*` — effort level per scenario (same scenarios as models)
+  - `effortMapping.*` — maps task effort labels (`low`, `medium`, `high`) to models
   - `timeout` — task timeout in seconds
   - `maxRetries` — max retry attempts per task
   - `autoCommit` — whether Claude auto-commits on task completion
   - `worktree` — default worktree mode for plan/do commands
+  - `syncMainBranch` — sync main branch with remote before worktree/PR operations (default: true)
   - `commitFormat.*` — commit message templates (`task`, `plan`, `amend`, `prefix`)
-  - `claudeCommand` — path/name of the Claude CLI binary
 - **Validation**: strict — unknown keys rejected at every nesting level (`src/utils/config.ts`)
 - **Deep-merge**: partial overrides merge with defaults (only specify keys you want to change)
-- **Helper accessors**: `getModel()`, `getEffort()`, `getCommitFormat()`, `getCommitPrefix()`, `getTimeout()`, `getMaxRetries()`, `getAutoCommit()`, `getWorktreeDefault()`, `getClaudeCommand()` (all in `src/utils/config.ts`)
+- **Helper accessors**: `getModel()`, `getEffortMapping()`, `resolveEffortToModel()`, `getModelTier()`, `applyModelCeiling()`, `getCommitFormat()`, `getCommitPrefix()`, `getTimeout()`, `getMaxRetries()`, `getAutoCommit()`, `getWorktreeDefault()`, `getSyncMainBranch()` (all in `src/utils/config.ts`)
 - **Full reference**: `src/prompts/config-docs.md` (also serves as system prompt for `raf config`)
+### Per-Task Model Resolution
+- Plan files contain `effort` frontmatter that determines which model executes the task
+- `effortMapping` config maps effort labels to models: `{ low: "haiku", medium: "sonnet", high: "opus" }`
+- `models.execute` acts as a ceiling — tasks can't exceed this model tier
+- On retry, tasks escalate to the ceiling model for a better chance of success
+- If a plan has no effort frontmatter, `models.execute` is used as a fallback (with a warning)
 ### `raf config` Command
 - `raf config` — launches interactive Claude session for viewing/editing config
 - `raf config "use haiku for name generation"` — session with initial prompt

package/RAF/ahvrih-rate-forge/decisions.md ADDED Viewed

@@ -0,0 +1,70 @@
+# Project Decisions
+## Where should the 5h window rate limit percentage be displayed?
+Per-task + total — show running 5h window % after each task AND in the final total summary.
+## Should the Sonnet baseline cap (88,000) be configurable?
+Yes, configurable — add a config key so users can adjust the Sonnet-equivalent token cap.
+## Should display toggles (rate limit %, cache tokens) be top-level or nested?
+Nested under `display` section — e.g., `display.showRateLimitEstimate: true`, `display.showCacheTokens: true`.
+## What approach for fixing mixed-attempt cost underreporting?
+Per-attempt cost — calculate cost for each attempt independently, then sum. Each attempt uses per-model if available, aggregate-fallback otherwise.
+## Should the branch name be configurable for worktree mode?
+No — the user clarified that "branch name" referred to the **main branch** (main, master, etc.) that gets pushed/pulled, not the worktree feature branch. Main branch should be auto-detected from `origin/HEAD`.
+## How should RAF version be displayed in `raf do` logs?
+Single combined line at the start of execution — e.g., `RAF v2.3.0 | Model: claude-opus-4-6 | Effort: high`.
+## Should `claudeCommand` be removed entirely or deprecated?
+Remove entirely — always use 'claude' as the command name, remove the config key, accessor, and all references.
+## Does removing `claudeCommand` also fix the PR #4 review comment about `raf config` fallback?
+Yes — removing `claudeCommand` means `getClaudePath()` will hardcode 'claude', so it can't throw due to broken config. No separate task needed; verification included in the removal task.
+## Should push-main and pull-main be controlled by one config key or two?
+Single key — `syncMainBranch: true` controls both pushing main before PR and pulling main before worktree creation.
+## For the main branch: config key or auto-detect?
+Auto-detect from `origin/HEAD` — no config key needed.
+## What scope for the README sync task?
+Critical only — fix `--merge` flag references, document the post-execution picker, and document PR creation from worktree. Defer medium/low items.
+## How should token tracking work for `raf plan` interactive sessions?
+Parse Claude CLI's session JSONL file after the interactive session ends. Pass `--session-id <uuid>` to `runInteractive()` so we know exactly which file to read. Session files are stored at `~/.claude/projects/<escaped-path>/<session-id>.jsonl` and contain usage data in assistant message entries.
+## Should token tracking for `raf plan` be a separate task or combined?
+Combined with task 03 (rate-limit-estimation) since it touches the same token tracking and display infrastructure.
+## Should `--no-session-persistence` be added to PR description generation only, or also failure analysis?
+Both — add `--no-session-persistence` to both `callClaudeForPrBody()` in `pull-request.ts` AND the failure analyzer in `failure-analyzer.ts`. Both are throwaway Claude calls that shouldn't pollute session history.
+## Model recommendation for all tasks in this project?
+All tasks are well-suited for Sonnet. Plans are detailed enough that none require Opus-level reasoning.
+## What format for model/effort markers in plan files?
+Obsidian-style properties at the top of the plan file with only a closing `---` delimiter. E.g., `model: sonnet` and `effort: medium`.
+## Should RAF use plan frontmatter markers during execution?
+Yes — RAF reads `model` from the plan frontmatter to override the global config per-task. The `effort` field is a human-readable complexity label (not Claude's `--effort` flag) that maps to a model via a configurable mapping (e.g., `low` → haiku, `medium` → sonnet, `high` → opus).
+## What does `effort` in plan frontmatter mean?
+It is a human-readable task complexity label, NOT Claude's `--effort` parameter. There is a configurable mapping from effort labels to models (e.g., `effortMapping: { low: "haiku", medium: "sonnet", high: "opus" }`). If both `model` and `effort` are in the frontmatter, `model` takes precedence.
+## Should the `effort.*` config section (Claude's --effort flag) be removed?
+Yes — remove the entire `effort.*` config section, the `EffortConfig` type, `EffortScenario` type, `VALID_EFFORTS`, `getEffort()` accessor, and all `CLAUDE_CODE_EFFORT_LEVEL` env var usage. Claude CLI will use its own default effort level.
+## How detailed should plans be regarding implementation?
+Neutral — remove the "no implementation details" restriction from planning prompts, but don't actively encourage code snippets. Let the planning model decide the appropriate detail level naturally.
+## How should config and plan frontmatter interact for model selection?
+Config as ceiling. `models.execute` is redefined as the **maximum model tier** allowed for task execution. The planner sets `effort` per task (required), which maps to a model via `effortMapping`. The final execution model is `min(mapped model, models.execute)` where "min" means the cheaper/lower-tier model. This gives users budget control while letting the planner differentiate task complexity. Model tier ordering: haiku < sonnet < opus.
+## Should plan frontmatter effort be required or optional?
+Required — the planning prompt mandates effort frontmatter on every task. If missing (e.g., manually created plans), warn and fall back to the config default.
+## What happens on retry when a task used a cheaper model?
+Bump to ceiling — on retry, use `models.execute` (the ceiling model) instead of the original frontmatter-resolved model. If the first attempt was already at the ceiling, retry with the same model.

package/RAF/ahvrih-rate-forge/input.md ADDED Viewed

@@ -0,0 +1,44 @@
+- [ ] add estimated percentage from 5h window. make default 88000 (this is sonnet token, for other models it should be converted based on model pricing. antropic say that there is one shared token/credit pool per 5‑hour window, not separate “in” and “out” caps. calculate sum in and out and divide by 88000 (for example)
+		- API prices tell you the relative weight:
+		•	Example current non‑batch API pricing:
+		•	Haiku 4.5: 1 $/M input, 5 $/M output
+		•	Sonnet 4.5: 3 $/M input, 15 $/M output
+		•	Opus 4.5 / 4.6: 5 $/M input, 25 $/M output
+		•	That means, on the API:
+		•	Opus input token ≈ 1.7× a Sonnet input token, 5× a Haiku input token.
+		•	Opus output token ≈ 1.7× a Sonnet output token, 5× a Haiku output token.
+		3.	Most reverse‑engineering + Anthropic’s own recommendations assume the subscription credits follow the same ratios. So in practice, on the same 5‑hour window:
+		•	If you spend it all on Sonnet, you get about the “44k / 88k / 220k tokens per 5 h” that people quote.
+		•	If you instead spend it all on Opus, you will hit the same credit ceiling with fewer raw tokens, roughly scaled by the price ratio (so ballpark ~60% of the Sonnet tokens, given Opus is ~1.7× the cost per token).
+		•	If you use Haiku, you can squeeze many more tokens into the same 5‑hour credit window, again roughly proportionate to its much lower price.
+	So when saying “heavier models like Opus eat that pool faster” it means:
+		•	The 5‑hour window is measured in cost‑weighted credits, not simple token count.
+		•	One Opus token “costs” more of that pool than one Sonnet token, in about the same ratio as the API prices.
+		•	Therefore, for the same 5‑hour cap, you get fewer total tokens with Opus, more with Haiku, middle with Sonnet.
+		also make it configurable there show session % estimation or not, configurable, also showing cache estimation is configurable.
+- [ ] `addTask` now prices a task from `accumulateUsage(attempts)`, but that merge can include attempts where only aggregate usage fields are present and `modelUsage` is empty (which `extractUsageData` allows when `event.modelUsage` is absent). If any other attempt has `modelUsage`, `calculateCost` takes the per-model branch and ignores aggregate-only tokens, so mixed-attempt retries underreport cost; compute cost per attempt or carry unmatched aggregate tokens into pricing.
+- [ ] push main to remote before making PR and pull main before creating gitworktree (configurable option, default true). also branch name is configurable
+- [ ] remove "claudeCommand": "claude" from config ( update docs)
+- [ ] in "raf do" logs show version which is used for execution in full format
+- [ ] adress PR (https://github.com/john-veresk/raf/pull/4) review comment [src/commands/config.ts](https://github.com/john-veresk/raf/pull/4/files/8b5786ed7c5e8aaec01cfa47e447550c2a684792#diff-8c467368fd64da9077ad462358fce5589607dfbfe03ac72941c7d74dbfba52aa)
+		Comment on lines +172 to +173
+		|   |
+		|---|
+		|model = DEFAULT_CONFIG.models.config;|
+		|effort = DEFAULT_CONFIG.effort.config;|
+		###
+		 **[![P1 Badge](https://camo.githubusercontent.com/c595229c0ecb6ee85b9c7804144d495f131a495ec87091fea2b262d954c9a92d/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f50312d6f72616e67653f7374796c653d666c6174)](https://camo.githubusercontent.com/c595229c0ecb6ee85b9c7804144d495f131a495ec87091fea2b262d954c9a92d/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f50312d6f72616e67653f7374796c653d666c6174) Use default Claude command when config parsing fails**
+		The new recovery path only falls back `model`/`effort` after config parsing errors, but `runInteractive()` still resolves the CLI binary via `getClaudeCommand()`inside `getClaudePath` (`src/core/claude-runner.ts`). With a malformed `~/.raf/raf.config.json`, that second config read still throws before the interactive session starts, so `raf config` remains unusable as a repair path even after showing the fallback warning.
+---
+- [ ] add token tracker feature from 'raf do' to 'raf plan', display stat AFTER planning interactive session (combined with task 03)
+- [ ] add --no-session-persistence to PR description generation and failure analysis Claude calls (like name-generator already has)
+- [ ] add per-task model/effort frontmatter metadata to plan files, RAF reads and uses during execution; remove "no implementation details" restriction from planning prompts

package/RAF/ahvrih-rate-forge/outcomes/01-remove-claude-command-config.md ADDED Viewed

@@ -0,0 +1,58 @@
+# Outcome: Remove `claudeCommand` from Config
+## Summary
+Removed the `claudeCommand` configuration key entirely from the RAF config system. The Claude CLI binary name is now hardcoded as `"claude"` throughout the codebase. This simplifies the config schema and eliminates a failure path where a malformed config file could prevent `raf config` from launching as a repair tool.
+## Key Changes
+### Types (`src/types/config.ts`)
+- Removed `claudeCommand: string` from `RafConfig` interface
+- Removed `claudeCommand: 'claude'` from `DEFAULT_CONFIG`
+- Removed `claudeCommand` from deprecated `DEFAULT_RAF_CONFIG` export
+### Config Utilities (`src/utils/config.ts`)
+- Removed `claudeCommand` from `VALID_TOP_LEVEL_KEYS` set
+- Removed `claudeCommand` validation logic
+- Removed `claudeCommand` handling from `deepMerge()` function
+- Removed `getClaudeCommand()` accessor function
+- Updated deprecated `loadConfig()` return type to exclude `claudeCommand`
+### Claude Runner (`src/core/claude-runner.ts`)
+- Updated `getClaudePath()` to use hardcoded `'which claude'` instead of calling `getClaudeCommand()`
+- Removed `getClaudeCommand` import
+### Failure Analyzer (`src/core/failure-analyzer.ts`)
+- Updated `getClaudePath()` to use hardcoded `'which claude'`
+- Removed `getClaudeCommand` import
+### Pull Request (`src/core/pull-request.ts`)
+- Updated `callClaudeForPrBody()` to use hardcoded `'which claude'`
+- Removed `getClaudeCommand` import
+### Name Generator (`src/utils/name-generator.ts`)
+- Updated `runClaudePrint()` to spawn `'claude'` directly instead of using `getClaudeCommand()`
+- Removed `getClaudeCommand` import
+### Documentation
+- Updated `src/prompts/config-docs.md` to remove `claudeCommand` section and references
+- Updated `CLAUDE.md` to remove `claudeCommand` from config schema and helper accessor list
+### Tests (`tests/unit/config.test.ts`)
+- Removed `getClaudeCommand` import
+- Removed `claudeCommand` from full valid config test
+- Removed tests for invalid `claudeCommand` values (empty, whitespace-only, non-string)
+- Removed `claudeCommand` default value test
+- Added test to verify that `claudeCommand` is now rejected as an unknown key
+## Acceptance Criteria Verification
+- [x] `claudeCommand` key no longer exists in types, defaults, validation, or documentation
+- [x] `getClaudePath()` works without reading any config
+- [x] `raf config` can launch successfully even with a malformed config file (config is no longer needed for CLI path resolution)
+- [x] All config tests pass (92 passed)
+- [x] Config files containing `claudeCommand` are handled gracefully (rejected as unknown key with clear error message)
+## Notes
+- The pre-existing test failures in `claude-runner-interactive.test.ts` and `validation.test.ts` are unrelated to this change - they concern model resolution expecting short aliases but receiving full model IDs
+- This change addresses PR #4 review comment about `raf config` being unusable as a repair path when config is malformed
+<promise>COMPLETE</promise>

package/RAF/ahvrih-rate-forge/outcomes/02-fix-mixed-attempt-cost.md ADDED Viewed

@@ -0,0 +1,46 @@
+# Outcome: Fix Mixed-Attempt Cost Underreporting
+## Summary
+Fixed the cost calculation in `TokenTracker.addTask()` to compute cost per-attempt rather than on merged/accumulated usage data. This prevents underreporting when task attempts have mixed `modelUsage` presence (some with per-model breakdown, others with only aggregate fields).
+## Key Changes
+### `src/utils/token-tracker.ts`
+1. **Added `sumCostBreakdowns()` helper function** (lines 27-45)
+   - Sums multiple `CostBreakdown` objects into a single total
+   - Sums all cost fields: `inputCost`, `outputCost`, `cacheReadCost`, `cacheCreateCost`, `totalCost`
+2. **Modified `addTask()` method** (lines 101-110)
+   - Now calculates cost per-attempt using `this.calculateCost()` on each individual `UsageData`
+   - Sums per-attempt costs using `sumCostBreakdowns()`
+   - This ensures attempts with empty `modelUsage` correctly fall back to sonnet pricing independently, rather than being silently dropped when merged with attempts that have `modelUsage`
+### `tests/unit/token-tracker.test.ts`
+Added new test suites:
+1. **`describe('mixed-attempt cost calculation (aggregate + modelUsage)')`** - 4 tests:
+   - `should correctly price attempts with mixed modelUsage presence` - Core mixed-attempt scenario
+   - `should not underreport cost when first attempt has no modelUsage` - Order independence
+   - `should handle all aggregate-only attempts` - Both attempts use sonnet fallback
+   - `should include cache costs from aggregate-only attempts` - Cache token handling
+2. **`describe('sumCostBreakdowns')`** - 3 tests:
+   - `should return zero breakdown for empty array`
+   - `should return same breakdown for single element`
+   - `should sum all cost fields across breakdowns`
+## Acceptance Criteria Verification
+- [x] Cost is calculated per-attempt, not on merged usage
+- [x] Mixed attempts (some with modelUsage, some without) report accurate total cost
+- [x] Per-attempt display in multi-attempt summaries shows correct individual costs (unchanged - `formatTaskTokenSummary` already uses `calculateAttemptCost` callback)
+- [x] Grand total cost across all tasks remains accurate (entry costs are summed in `getTotals()`)
+- [x] New test cases cover the mixed-attempt edge case
+- [x] Existing token tracking tests still pass (34 tests pass)
+## Notes
+- The pre-existing test failures in `claude-runner-interactive.test.ts` and `validation.test.ts` are unrelated to this change - they concern model resolution expecting short aliases but receiving full model IDs
+- Token count accumulation (`accumulateUsage()`) remains unchanged - only cost calculation was modified
+<promise>COMPLETE</promise>

package/RAF/ahvrih-rate-forge/outcomes/03-rate-limit-estimation.md ADDED Viewed

@@ -0,0 +1,82 @@
+# Outcome: Add 5h Window Rate Limit Estimation + Plan Session Token Tracking
+## Summary
+Implemented estimated 5-hour rate limit window percentage display and token usage tracking for `raf plan` interactive sessions.
+## Key Changes
+### Config Types (`src/types/config.ts`)
+- Added `DisplayConfig` interface with `showRateLimitEstimate` and `showCacheTokens` boolean fields
+- Added `RateLimitWindowConfig` interface with `sonnetTokenCap` number field
+- Added both to `RafConfig` with defaults: `showRateLimitEstimate: true`, `showCacheTokens: true`, `sonnetTokenCap: 88000`
+### Config Validation (`src/utils/config.ts`)
+- Added `display` and `rateLimitWindow` to `VALID_TOP_LEVEL_KEYS`
+- Added validation for `display` (object with boolean values for known keys)
+- Added validation for `rateLimitWindow` (object with positive number for `sonnetTokenCap`)
+- Added deep merge support for both new sections in `deepMerge()`
+- Added accessor helpers: `getDisplayConfig()`, `getRateLimitWindowConfig()`, `getShowRateLimitEstimate()`, `getShowCacheTokens()`, `getSonnetTokenCap()`
+### Config Documentation (`src/prompts/config-docs.md`)
+- Added `display` section explaining `showRateLimitEstimate` and `showCacheTokens`
+- Added `rateLimitWindow` section explaining `sonnetTokenCap` and the conversion formula
+- Updated validation rules and full example config
+### Token Tracker (`src/utils/token-tracker.ts`)
+- Added `calculateRateLimitPercentage(totalCost, sonnetTokenCap?)` method
+  - Converts cost to Sonnet-equivalent tokens using average Sonnet pricing
+  - Formula: `sonnetEquivalentTokens = cost / avgSonnetCostPerToken`, then `percentage = tokens / cap * 100`
+- Added `getCumulativeRateLimitPercentage(sonnetTokenCap?)` method for grand totals
+### Terminal Formatting (`src/utils/terminal-symbols.ts`)
+- Added `TokenSummaryOptions` interface for display configuration
+- Added `formatRateLimitPercentage(percentage)` helper (uses tilde prefix for estimate indicator)
+- Updated `formatTokenLine()` to accept options and conditionally show cache tokens and rate limit
+- Updated `formatTaskTokenSummary()` to accept options (rate limit only on total for multi-attempt)
+- Updated `formatTokenTotalSummary()` to accept options
+### Claude Runner (`src/core/claude-runner.ts`)
+- Added `sessionId` option to `ClaudeRunnerOptions`
+- Updated `runInteractive()` to pass `--session-id <uuid>` when sessionId is provided
+### Session Parser (`src/utils/session-parser.ts`) - NEW FILE
+- `escapeProjectPath(path)` - escapes project path for Claude's naming scheme
+- `getSessionFilePath(sessionId, cwd)` - computes expected session file location
+- `parseSessionFile(filePath)` - parses JSONL, accumulates usage from assistant messages
+- `parseSessionById(sessionId, cwd)` - convenience wrapper
+- Handles missing files, malformed JSON lines, entries without usage/model gracefully
+### Plan Command (`src/commands/plan.ts`)
+- Generates UUID session ID before `runInteractive()` calls
+- Passes sessionId to both plan and amend flows
+- Added `displayPlanSessionTokenSummary()` to parse session file and display formatted usage
+  - Uses `TokenTracker` and display config for consistent formatting
+  - Logs debug message if session file not found (graceful degradation)
+### Tests Added
+- `tests/unit/config.test.ts`: 15 tests for display/rateLimitWindow validation and resolution
+- `tests/unit/token-tracker.test.ts`: 7 tests for rate limit percentage calculation
+- `tests/unit/terminal-symbols.test.ts`: 13 tests for display options and rate limit formatting
+- `tests/unit/session-parser.test.ts`: 15 tests (new file) for session file parsing
+## Acceptance Criteria Verification
+- [x] After each task, token summary includes `~X% of 5h window` when enabled
+- [x] Grand total summary includes cumulative percentage when enabled
+- [x] Percentage correctly reflects cost-weighted usage (Opus > Sonnet > Haiku)
+- [x] Multi-model tasks correctly account for different models across attempts
+- [x] `display.showRateLimitEstimate: false` hides the percentage
+- [x] `display.showCacheTokens: false` hides cache token counts
+- [x] `rateLimitWindow.sonnetTokenCap` correctly adjusts the denominator
+- [x] Config validation accepts the new keys
+- [x] Config docs updated with new keys and explanation
+- [x] After `raf plan` interactive session, token usage summary is displayed
+- [x] After `raf plan --amend` interactive session, token usage summary is displayed
+- [x] Session file parsing handles missing/malformed files gracefully
+- [x] Tests cover conversion math, display toggling, and session file parsing
+## Notes
+- The pre-existing test failures in `claude-runner-interactive.test.ts` and `validation.test.ts` are unrelated to this change - they concern model resolution and effort level handling
+- The rate limit percentage is deliberately marked as an estimate with tilde (~) prefix since the actual Anthropic algorithm may differ
+- Session file location is based on Claude CLI's current storage format (`~/.claude/projects/<escaped-path>/<session-id>.jsonl`)
+<promise>COMPLETE</promise>

package/RAF/ahvrih-rate-forge/outcomes/04-show-version-in-do-logs.md ADDED Viewed

@@ -0,0 +1,45 @@
+# Outcome: Show RAF Version and Model in `raf do` Logs
+## Summary
+Added a version and model info line at the start of every `raf do` execution, displayed before any task runs. The format is `RAF v{version} | Model: {fullModelId}` and uses dim text styling for subtlety.
+## Key Changes
+### `src/utils/config.ts`
+- Added `MODEL_ALIAS_TO_FULL_ID` constant mapping short aliases to current full model IDs:
+  - `opus` → `claude-opus-4-6`
+  - `sonnet` → `claude-sonnet-4-5-20250929`
+  - `haiku` → `claude-haiku-4-5-20251001`
+- Added `resolveFullModelId(modelName: string): string` function that:
+  - Resolves short aliases (`opus`, `sonnet`, `haiku`) to their full model IDs
+  - Returns full model IDs and unknown strings as-is
+### `src/commands/do.ts`
+- Added imports for `resolveFullModelId` from config.ts and `getVersion` from version.ts
+- Added version/model log line in `executeSingleProject()` at line 729:
+  ```typescript
+  const fullModelId = resolveFullModelId(model);
+  logger.dim(`RAF v${getVersion()} | Model: ${fullModelId}`);
+  ```
+- Removed `showModel` from `SingleProjectOptions` interface (no longer needed since version/model is always shown)
+- Removed `showModel: true` from the call site
+### `tests/unit/config.test.ts`
+- Added import for `resolveFullModelId`
+- Added 3 new tests in `describe('resolveFullModelId')`:
+  - `should resolve short aliases to full model IDs`
+  - `should return full model IDs as-is`
+  - `should return unknown model strings as-is`
+## Acceptance Criteria Verification
+- [x] A version/model line appears at the start of every `raf do` execution
+- [x] Model name is shown in full format (e.g., `claude-opus-4-6`)
+- [x] Line appears before any task execution output
+- [x] Works in both worktree and non-worktree modes
+## Notes
+- The pre-existing test failures in `claude-runner-interactive.test.ts` and `validation.test.ts` are unrelated to this change - they concern model resolution expecting short aliases but receiving full model IDs
+- The version/model line uses `logger.dim()` for subtle display that doesn't clutter output
+- All 111 config tests pass, including the 3 new tests for `resolveFullModelId`
+<promise>COMPLETE</promise>

package/RAF/ahvrih-rate-forge/outcomes/05-sync-main-before-worktree.md ADDED Viewed

@@ -0,0 +1,96 @@
+# Outcome: Sync Main Branch Before Worktree/PR Operations
+## Summary
+Implemented automatic syncing of the main branch with the remote before worktree creation and PR creation, with a configurable toggle. The main branch is auto-detected from `refs/remotes/origin/HEAD`, falling back to `main` or `master`. Failures produce warnings but don't block the workflow.
+## Key Changes
+### Types (`src/types/config.ts`)
+- Added `syncMainBranch: boolean` to `RafConfig` interface
+- Added default `syncMainBranch: true` to `DEFAULT_CONFIG`
+### Config Utilities (`src/utils/config.ts`)
+- Added `syncMainBranch` to `VALID_TOP_LEVEL_KEYS` set
+- Added validation for `syncMainBranch` (must be boolean)
+- Added `syncMainBranch` handling in `deepMerge()` function
+- Added `getSyncMainBranch()` accessor function
+### Worktree Utilities (`src/core/worktree.ts`)
+- Added `SyncMainBranchResult` interface for sync operation results
+- Added `detectMainBranch(cwd?)` function:
+  - Detects main branch from `refs/remotes/origin/HEAD`
+  - Falls back to `main` or `master` if origin/HEAD not set
+- Added `pullMainBranch(cwd?)` function:
+  - When not on main: fetches `origin main:main` (updates local ref directly)
+  - When on main: runs `git fetch` + `git merge --ff-only`
+  - Handles uncommitted changes, diverged branches, and network errors gracefully
+- Added `pushMainBranch(cwd?)` function:
+  - Pushes main branch to origin
+  - Handles "already up-to-date" and rejection errors gracefully
+### Plan Command (`src/commands/plan.ts`)
+- Imported `getSyncMainBranch` and `pullMainBranch`
+- Added main branch sync before worktree creation in `runPlanCommand()` for fresh worktrees
+- Added main branch sync before worktree creation in `runAmendCommand()` for recreated worktrees
+### Do Command (`src/commands/do.ts`)
+- Imported `getSyncMainBranch`, `pullMainBranch`, and `pushMainBranch`
+- Added main branch sync before worktree operations in `runDoCommand()`
+- Added main branch push before PR creation in `executePostAction()` for the 'pr' case
+### Documentation
+- Updated `src/prompts/config-docs.md`:
+  - Added `syncMainBranch` section with description
+  - Updated validation rules to include `syncMainBranch`
+  - Updated full example config to include `syncMainBranch: true`
+- Updated `CLAUDE.md`:
+  - Added `syncMainBranch` to config schema list
+  - Added `getSyncMainBranch()` to helper accessors list
+### Tests
+- `tests/unit/config.test.ts`:
+  - Added import for `getSyncMainBranch`
+  - Added test for rejecting non-boolean `syncMainBranch`
+  - Added test for accepting boolean `syncMainBranch` values
+  - Added test for overriding `syncMainBranch` in config
+  - Added test for default `syncMainBranch` value (true)
+- `tests/unit/worktree.test.ts`:
+  - Added imports for `detectMainBranch`, `pullMainBranch`, `pushMainBranch`
+  - Added 5 tests for `detectMainBranch()`:
+    - Detecting from origin/HEAD
+    - Detecting master from origin/HEAD
+    - Falling back to main
+    - Falling back to master when main doesn't exist
+    - Returning null when no main branch found
+  - Added 7 tests for `pullMainBranch()`:
+    - Error when main branch cannot be detected
+    - Fetching main when not on main branch
+    - Warning when local main has diverged
+    - Failing when on main with uncommitted changes
+    - Pulling successfully when on main with no changes
+    - Reporting no changes when already up to date
+    - Failing when branch has diverged
+  - Added 4 tests for `pushMainBranch()`:
+    - Error when main branch cannot be detected
+    - Pushing successfully
+    - Reporting no changes when already up to date
+    - Failing when push is rejected
+- `tests/unit/post-execution-picker.test.ts`:
+  - Updated worktree mock to include `pullMainBranch`, `pushMainBranch`, and `detectMainBranch`
+## Acceptance Criteria Verification
+- [x] Main branch is pulled from remote before worktree creation (when `syncMainBranch: true`)
+- [x] Main branch is pushed to remote before PR creation (when `syncMainBranch: true`)
+- [x] Main branch name is auto-detected from `origin/HEAD`
+- [x] `syncMainBranch: false` skips both operations
+- [x] Failures in push/pull produce warnings but don't block the workflow
+- [x] Config validation accepts the new key
+- [x] Config docs updated
+## Notes
+- The pre-existing test failures in `claude-runner-interactive.test.ts` and `validation.test.ts` are unrelated to this change - they concern model resolution expecting short aliases but receiving full model IDs
+- When not on the main branch, the pull uses `git fetch origin main:main` which directly updates the local ref without checking out the branch, avoiding disruption to the user's current work
+- When on the main branch, uncommitted changes will block the pull to prevent data loss
+- The push operation is simple and will fail gracefully if the local main has diverged from remote (requires manual resolution)
+<promise>COMPLETE</promise>

package/RAF/ahvrih-rate-forge/outcomes/06-sync-readme-with-codebase.md ADDED Viewed

@@ -0,0 +1,45 @@
+# Outcome: Sync README with Codebase (Critical Items)
+## Summary
+Fixed critical discrepancies between README.md and the actual codebase implementation, focusing on worktree mode documentation.
+## Key Changes
+### README.md
+1. **Removed `--merge` flag references**
+   - Removed from usage example: `raf do my-feature -w --merge` → `raf do my-feature -w`
+   - Removed from Command Reference table for `raf do`
+   - Updated example description to mention picker will ask what to do after
+2. **Documented post-execution action picker**
+   - Added new "Post-execution picker" section under Worktree Mode
+   - Documented all three options:
+     - "Merge into current branch" — merges with fast-forward preferred
+     - "Create a GitHub PR" — pushes branch and creates PR
+     - "Leave branch as-is" — keeps branch for later
+   - Clarified that picker appears BEFORE task execution
+   - Noted that action is skipped on task failure
+3. **Documented PR creation feature**
+   - Added new "PR creation" section under Worktree Mode
+   - Documented prerequisites: `gh` CLI installed and authenticated
+   - Documented auto-detection of base branch from `origin/HEAD`
+   - Documented PR title generation from project name
+   - Documented Claude-powered PR body generation
+   - Documented fallback behavior when `gh` is unavailable
+4. **Fixed worktree cleanup description**
+   - Changed "Worktrees persist after completion — clean them up manually" to "After successful post-actions (merge, PR, or leave), the worktree directory is cleaned up automatically — the git branch is preserved"
+   - Clarified that on task failure, the worktree is kept for inspection
+   - Updated Basic workflow example to remove `--merge` flag
+## Acceptance Criteria Verification
+- [x] No references to `--merge` flag remain in README
+- [x] Post-execution action picker is documented with all three options
+- [x] PR creation from worktree is documented including prerequisites
+- [x] Worktree cleanup behavior is accurately described
+- [x] All CLI examples use valid, existing flags
+- [x] README reads naturally and doesn't feel patched
+<promise>COMPLETE</promise>

package/RAF/ahvrih-rate-forge/outcomes/07-no-session-persistence.md ADDED Viewed

@@ -0,0 +1,26 @@
+# Outcome: Add --no-session-persistence to Throwaway Claude Calls
+## Summary
+Added `--no-session-persistence` flag to PR body generation and failure analysis Claude calls to prevent them from polluting the user's session history (`claude --resume`). This matches the existing pattern in `name-generator.ts`.
+## Key Changes
+### `src/core/pull-request.ts`
+- Added `'--no-session-persistence'` to the spawn args in `callClaudeForPrBody()` (line 371)
+- Flag is placed after `--model` and before `--dangerously-skip-permissions`
+### `src/core/failure-analyzer.ts`
+- Added `'--no-session-persistence'` to the spawn args in `callClaudeForAnalysis()` (line 314)
+- Flag is placed after `--model` and before `--dangerously-skip-permissions`
+## Acceptance Criteria Verification
+- [x] PR body generation sessions don't appear in `claude --resume`
+- [x] Failure analysis sessions don't appear in `claude --resume`
+- [x] Both features still function correctly (output unchanged) - all 97 related tests pass
+- [x] Pattern matches the existing implementation in `name-generator.ts`
+## Notes
+- The `--no-session-persistence` flag only works with `-p` (print mode), which both call sites already use
+- This is a minimal two-line change (one per file) with no behavioral changes to the output
+<promise>COMPLETE</promise>