rafcode 2.1.1 → 2.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (120) hide show
  1. package/.claude/settings.local.json +4 -1
  2. package/CLAUDE.md +59 -11
  3. package/RAF/ahslfe-config-wizard/decisions.md +34 -0
  4. package/RAF/ahslfe-config-wizard/input.md +1 -0
  5. package/RAF/ahslfe-config-wizard/outcomes/01-define-config-schema.md +38 -0
  6. package/RAF/ahslfe-config-wizard/outcomes/02-refactor-codebase-to-use-config.md +67 -0
  7. package/RAF/ahslfe-config-wizard/outcomes/03-create-config-documentation.md +37 -0
  8. package/RAF/ahslfe-config-wizard/outcomes/04-implement-raf-config-command.md +47 -0
  9. package/RAF/ahslfe-config-wizard/outcomes/05-update-claude-md.md +26 -0
  10. package/RAF/ahslfe-config-wizard/plans/01-define-config-schema.md +73 -0
  11. package/RAF/ahslfe-config-wizard/plans/02-refactor-codebase-to-use-config.md +74 -0
  12. package/RAF/ahslfe-config-wizard/plans/03-create-config-documentation.md +57 -0
  13. package/RAF/ahslfe-config-wizard/plans/04-implement-raf-config-command.md +66 -0
  14. package/RAF/ahslfe-config-wizard/plans/05-update-claude-md.md +60 -0
  15. package/RAF/ahstvo-token-tracker/decisions.md +44 -0
  16. package/RAF/ahstvo-token-tracker/input.md +3 -0
  17. package/RAF/ahstvo-token-tracker/outcomes/01-full-model-id-support.md +43 -0
  18. package/RAF/ahstvo-token-tracker/outcomes/02-name-generation-no-session.md +33 -0
  19. package/RAF/ahstvo-token-tracker/outcomes/03-unify-stream-json-execution.md +48 -0
  20. package/RAF/ahstvo-token-tracker/outcomes/04-token-tracking-cost-calculation.md +53 -0
  21. package/RAF/ahstvo-token-tracker/outcomes/05-token-cost-console-reporting.md +57 -0
  22. package/RAF/ahstvo-token-tracker/outcomes/06-runtime-verbose-toggle.md +53 -0
  23. package/RAF/ahstvo-token-tracker/outcomes/07-readme-config-docs.md +36 -0
  24. package/RAF/ahstvo-token-tracker/plans/01-full-model-id-support.md +35 -0
  25. package/RAF/ahstvo-token-tracker/plans/02-name-generation-no-session.md +36 -0
  26. package/RAF/ahstvo-token-tracker/plans/03-unify-stream-json-execution.md +44 -0
  27. package/RAF/ahstvo-token-tracker/plans/04-token-tracking-cost-calculation.md +56 -0
  28. package/RAF/ahstvo-token-tracker/plans/05-token-cost-console-reporting.md +55 -0
  29. package/RAF/ahstvo-token-tracker/plans/06-runtime-verbose-toggle.md +48 -0
  30. package/RAF/ahstvo-token-tracker/plans/07-readme-config-docs.md +44 -0
  31. package/README.md +34 -0
  32. package/dist/commands/config.d.ts +3 -0
  33. package/dist/commands/config.d.ts.map +1 -0
  34. package/dist/commands/config.js +173 -0
  35. package/dist/commands/config.js.map +1 -0
  36. package/dist/commands/do.d.ts.map +1 -1
  37. package/dist/commands/do.js +47 -6
  38. package/dist/commands/do.js.map +1 -1
  39. package/dist/commands/plan.d.ts.map +1 -1
  40. package/dist/commands/plan.js +3 -2
  41. package/dist/commands/plan.js.map +1 -1
  42. package/dist/core/claude-runner.d.ts +19 -2
  43. package/dist/core/claude-runner.d.ts.map +1 -1
  44. package/dist/core/claude-runner.js +43 -96
  45. package/dist/core/claude-runner.js.map +1 -1
  46. package/dist/core/failure-analyzer.d.ts.map +1 -1
  47. package/dist/core/failure-analyzer.js +6 -3
  48. package/dist/core/failure-analyzer.js.map +1 -1
  49. package/dist/core/git.d.ts.map +1 -1
  50. package/dist/core/git.js +10 -3
  51. package/dist/core/git.js.map +1 -1
  52. package/dist/core/pull-request.d.ts +1 -1
  53. package/dist/core/pull-request.d.ts.map +1 -1
  54. package/dist/core/pull-request.js +7 -4
  55. package/dist/core/pull-request.js.map +1 -1
  56. package/dist/index.js +2 -0
  57. package/dist/index.js.map +1 -1
  58. package/dist/parsers/stream-renderer.d.ts +16 -1
  59. package/dist/parsers/stream-renderer.d.ts.map +1 -1
  60. package/dist/parsers/stream-renderer.js +34 -4
  61. package/dist/parsers/stream-renderer.js.map +1 -1
  62. package/dist/prompts/execution.d.ts.map +1 -1
  63. package/dist/prompts/execution.js +11 -1
  64. package/dist/prompts/execution.js.map +1 -1
  65. package/dist/types/config.d.ts +95 -4
  66. package/dist/types/config.d.ts.map +1 -1
  67. package/dist/types/config.js +63 -3
  68. package/dist/types/config.js.map +1 -1
  69. package/dist/utils/config.d.ts +59 -7
  70. package/dist/utils/config.d.ts.map +1 -1
  71. package/dist/utils/config.js +276 -21
  72. package/dist/utils/config.js.map +1 -1
  73. package/dist/utils/name-generator.d.ts +3 -7
  74. package/dist/utils/name-generator.d.ts.map +1 -1
  75. package/dist/utils/name-generator.js +75 -61
  76. package/dist/utils/name-generator.js.map +1 -1
  77. package/dist/utils/terminal-symbols.d.ts +21 -0
  78. package/dist/utils/terminal-symbols.d.ts.map +1 -1
  79. package/dist/utils/terminal-symbols.js +62 -0
  80. package/dist/utils/terminal-symbols.js.map +1 -1
  81. package/dist/utils/token-tracker.d.ts +45 -0
  82. package/dist/utils/token-tracker.d.ts.map +1 -0
  83. package/dist/utils/token-tracker.js +107 -0
  84. package/dist/utils/token-tracker.js.map +1 -0
  85. package/dist/utils/validation.d.ts +5 -5
  86. package/dist/utils/validation.d.ts.map +1 -1
  87. package/dist/utils/validation.js +10 -6
  88. package/dist/utils/validation.js.map +1 -1
  89. package/dist/utils/verbose-toggle.d.ts +33 -0
  90. package/dist/utils/verbose-toggle.d.ts.map +1 -0
  91. package/dist/utils/verbose-toggle.js +94 -0
  92. package/dist/utils/verbose-toggle.js.map +1 -0
  93. package/package.json +1 -1
  94. package/src/commands/config.ts +204 -0
  95. package/src/commands/do.ts +56 -5
  96. package/src/commands/plan.ts +3 -2
  97. package/src/core/claude-runner.ts +59 -115
  98. package/src/core/failure-analyzer.ts +6 -3
  99. package/src/core/git.ts +10 -3
  100. package/src/core/pull-request.ts +7 -4
  101. package/src/index.ts +2 -0
  102. package/src/parsers/stream-renderer.ts +54 -4
  103. package/src/prompts/config-docs.md +331 -0
  104. package/src/prompts/execution.ts +13 -1
  105. package/src/types/config.ts +156 -7
  106. package/src/utils/config.ts +335 -21
  107. package/src/utils/name-generator.ts +84 -71
  108. package/src/utils/terminal-symbols.ts +68 -0
  109. package/src/utils/token-tracker.ts +135 -0
  110. package/src/utils/validation.ts +15 -10
  111. package/src/utils/verbose-toggle.ts +103 -0
  112. package/tests/unit/claude-runner.test.ts +171 -7
  113. package/tests/unit/config-command.test.ts +163 -0
  114. package/tests/unit/config.test.ts +608 -30
  115. package/tests/unit/name-generator.test.ts +99 -75
  116. package/tests/unit/pull-request.test.ts +2 -0
  117. package/tests/unit/stream-renderer.test.ts +83 -0
  118. package/tests/unit/terminal-symbols.test.ts +157 -0
  119. package/tests/unit/token-tracker.test.ts +352 -0
  120. package/tests/unit/verbose-toggle.test.ts +204 -0
@@ -0,0 +1,66 @@
1
+ # Task: Implement `raf config` Command
2
+
3
+ ## Objective
4
+ Create the `raf config [prompt]` CLI command that launches an interactive Claude Sonnet TTY session for editing RAF configuration.
5
+
6
+ ## Context
7
+ Users need a natural-language way to modify their config. This command spawns Claude Sonnet with the config documentation as a system prompt, giving it full knowledge of the schema. It also supports `--reset` to restore defaults.
8
+
9
+ ## Dependencies
10
+ 01, 02, 03
11
+
12
+ ## Requirements
13
+
14
+ ### `raf config` (no arguments)
15
+ - Spawn an interactive Claude Sonnet TTY session (same node-pty approach used in `raf plan`)
16
+ - Claude receives the config docs from `src/prompts/config-docs.md` as an appended system prompt
17
+ - Claude also receives the current config file contents (or "no config file exists, using defaults")
18
+ - User has a back-and-forth conversation with Claude to modify settings
19
+ - Claude reads, modifies, and writes `~/.raf/raf.config.json`
20
+ - After Claude writes, RAF validates the result and reports errors if invalid
21
+
22
+ ### `raf config <prompt>`
23
+ - Same as above but with an initial prompt pre-filled
24
+ - Still interactive TTY — user can continue the conversation after the initial prompt is handled
25
+
26
+ ### `raf config --reset`
27
+ - Prompt user for confirmation ("This will delete ~/.raf/raf.config.json and restore all defaults. Continue? [y/N]")
28
+ - On confirm: delete the config file, print success message
29
+ - On deny: abort, print "Cancelled"
30
+
31
+ ### Validation after session:
32
+ - When the Claude session ends, read the config file and validate it
33
+ - If invalid: print warnings showing what's wrong, but don't delete the file (user can fix it)
34
+ - If valid: print "Config updated successfully" with a summary of changes
35
+
36
+ ### Model and effort:
37
+ - Use `config.models.config` for the model (default: `'sonnet'`)
38
+ - Use `config.effort.config` for reasoning effort (default: `'medium'`)
39
+
40
+ ## Implementation Steps
41
+ 1. Create `src/commands/config.ts` with the Commander.js command definition
42
+ 2. Register the command in `src/index.ts` CLI setup
43
+ 3. Implement `--reset` flag: confirmation prompt, file deletion, feedback
44
+ 4. Build the system prompt: load `src/prompts/config-docs.md` content, append current config state
45
+ 5. Spawn interactive Claude session using the existing `ClaudeRunner` TTY infrastructure (same pattern as planning mode)
46
+ 6. Pass the model from `config.models.config` and effort from `config.effort.config`
47
+ 7. After session ends: validate the config file, report results
48
+ 8. Handle edge cases: `~/.raf/` directory doesn't exist (create it), config file doesn't exist yet (that's fine, Claude will create it)
49
+ 9. Write tests for the command setup, reset flow, and post-session validation
50
+
51
+ ## Acceptance Criteria
52
+ - [ ] `raf config` starts an interactive Claude Sonnet session with config knowledge
53
+ - [ ] `raf config "use haiku for name generation"` starts session with that prompt
54
+ - [ ] `raf config --reset` prompts for confirmation and deletes config file
55
+ - [ ] Claude session has full config documentation as system prompt
56
+ - [ ] Claude session shows current config state
57
+ - [ ] Post-session validation checks the config file
58
+ - [ ] `~/.raf/` directory is created if it doesn't exist
59
+ - [ ] Command is registered and appears in `raf --help`
60
+ - [ ] Tests cover command setup and reset flow
61
+
62
+ ## Notes
63
+ - Reuse the existing TTY session infrastructure from `src/core/claude-runner.ts` — don't reinvent it
64
+ - The planning command in `src/commands/plan.ts` is the best reference for how to spawn interactive Claude sessions
65
+ - Claude needs `--dangerously-skip-permissions` so it can write to `~/.raf/raf.config.json` without asking
66
+ - The config docs file needs to be readable at runtime — consider how the built JS accesses the .md file (may need to copy it to dist or embed it as a string)
@@ -0,0 +1,60 @@
1
+ # Task: Update CLAUDE.md with Config Architecture
2
+
3
+ ## Objective
4
+ Update CLAUDE.md with the new config system architecture, the "configurable by default" principle, and all related documentation.
5
+
6
+ ## Context
7
+ CLAUDE.md serves as the project's internal documentation and instruction set. It needs to reflect the new config system so that future Claude sessions (planning, execution) understand how config works. The user also wants an explicit architectural principle: if a feature can be configurable, it should be configurable via config.
8
+
9
+ ## Dependencies
10
+ 01, 02, 03, 04
11
+
12
+ ## Requirements
13
+
14
+ ### Add "Configurable by Default" principle:
15
+ - New architectural decision section stating: if a feature/setting can be configurable, it should be configurable through the config system
16
+ - Default config provides sensible defaults; global config in `~/.raf/raf.config.json` overrides them
17
+ - CLI flags override config values (three-tier precedence: CLI flag > global config > built-in defaults)
18
+
19
+ ### Document the config system:
20
+ - Config file location: `~/.raf/raf.config.json`
21
+ - Schema overview: `models`, `effort`, `timeout`, `maxRetries`, `autoCommit`, `worktree`, `commitFormat`, `claudeCommand`
22
+ - Precedence chain: CLI flag > `~/.raf/raf.config.json` > DEFAULT_CONFIG
23
+ - Validation: strict, unknown keys rejected
24
+ - Reference to `src/types/config.ts` as single source of truth
25
+
26
+ ### Document the `raf config` command:
27
+ - `raf config` — interactive Claude session for config editing
28
+ - `raf config <prompt>` — interactive session with initial prompt
29
+ - `raf config --reset` — restore defaults with confirmation
30
+ - Config docs bundled at `src/prompts/config-docs.md`
31
+
32
+ ### Update existing sections:
33
+ - Update "Development Commands" if any new npm scripts were added
34
+ - Update file structure to include new files (`src/commands/config.ts`, `src/prompts/config-docs.md`)
35
+ - Update any references to the old config system (`raf.config.json` in project dirs → `~/.raf/raf.config.json`)
36
+ - Update the commit format section to reference config templates instead of hardcoded formats
37
+
38
+ ## Implementation Steps
39
+ 1. Read current CLAUDE.md to understand existing structure
40
+ 2. Add "Configurable by Default" as a new architectural decision section
41
+ 3. Add "Configuration System" section documenting schema, loading, validation, precedence
42
+ 4. Add `raf config` to the commands documentation
43
+ 5. Update directory structure to include new files
44
+ 6. Update commit format documentation to reference templates
45
+ 7. Update any stale references to the old project-local config
46
+ 8. Keep the document concise — link to `src/prompts/config-docs.md` for full config reference rather than duplicating it
47
+
48
+ ## Acceptance Criteria
49
+ - [ ] "Configurable by Default" principle documented as architectural decision
50
+ - [ ] Config system fully documented (location, schema overview, precedence, validation)
51
+ - [ ] `raf config` command documented
52
+ - [ ] Directory structure updated
53
+ - [ ] Old config references updated
54
+ - [ ] Commit format section references templates
55
+ - [ ] CLAUDE.md remains well-organized and concise
56
+
57
+ ## Notes
58
+ - CLAUDE.md is the source of truth for Claude sessions working on this project — accuracy is critical
59
+ - Don't duplicate the full config reference (that's in `src/prompts/config-docs.md`) — just provide an overview and pointer
60
+ - The principle "configurable by default" should guide future development: when adding new features, add corresponding config keys
@@ -0,0 +1,44 @@
1
+ # Project Decisions
2
+
3
+ ## Should short model names (sonnet, haiku, opus) still be supported alongside full model IDs?
4
+ Both supported. Keep short names as aliases and also accept full model IDs like claude-opus-4-5-20251101.
5
+
6
+ ## How should full model IDs be validated?
7
+ Regex pattern. Validate that full IDs match a pattern like claude-{family}-{version} (e.g., claude-opus-4-5-20251101).
8
+
9
+ ## Should default config values stay as short names or change to full model IDs?
10
+ Keep short names. Defaults stay as 'opus', 'sonnet', 'haiku' - simple and auto-resolve to latest.
11
+
12
+ ## How should name generation call Claude CLI to avoid registering a session?
13
+ Use the same spawn-based approach as `raf do` (claude-runner.ts) but without the `--dangerously-skip-permissions` flag. Use `spawn` with `-p` flag for non-interactive print mode. Additionally, use the `--no-session-persistence` flag which prevents sessions from being saved to disk.
14
+
15
+ ## What token usage data is available from Claude CLI?
16
+ Claude CLI provides comprehensive token data in both `--output-format json` and stream-json `result` events:
17
+ - `total_cost_usd`: Total cost in USD (already calculated by Claude CLI)
18
+ - `usage.input_tokens`, `usage.output_tokens`, `usage.cache_creation_input_tokens`, `usage.cache_read_input_tokens`
19
+ - `modelUsage.<model-id>`: Per-model breakdown with `inputTokens`, `outputTokens`, `cacheReadInputTokens`, `cacheCreationInputTokens`, `costUSD`
20
+ - `duration_ms`, `duration_api_ms`: Timing info
21
+
22
+ ## Where should token reports be displayed?
23
+ Console output only. Print token counts and cost estimates to terminal after each task and at the end (total).
24
+
25
+ ## Should pricing be hardcoded or configurable?
26
+ Configurable in raf config, with current prices as defaults.
27
+
28
+ ## Should we use Claude CLI's built-in total_cost_usd or compute our own?
29
+ Compute own price from token counts × configurable prices. CLI's total_cost_usd doesn't work for subscription users.
30
+
31
+ ## How should task execution capture token counts?
32
+ Switch all task execution to stream-json mode (already used in verbose mode). Parse the `result` event for usage data. Non-verbose mode suppresses tool display but still uses stream-json format to capture tokens.
33
+
34
+ ## How should verbose mode toggling during execution work?
35
+ Keypress listener on process.stdin. Since all execution uses stream-json after task 03, toggling is purely a display concern — whether tool-use lines are printed or suppressed. Node's event loop can handle stdin events and child process output concurrently.
36
+
37
+ ## Which key toggles verbose mode?
38
+ Tab key. Press Tab during task execution to toggle verbose display on/off.
39
+
40
+ ## How detailed should the raf config README section be?
41
+ Brief + examples. Command usage, 1-2 basic config examples, mention that `raf config` launches an interactive Claude session for help. Keep it concise like other command sections in the README.
42
+
43
+ ## Should CLAUDE.md mention README update requirements more explicitly?
44
+ Yes. Add a note about always updating README when adding/changing CLI commands, API changes, or important features (like worktrees, config).
@@ -0,0 +1,3 @@
1
+ - [ ] add token count and report it in the end of each task and total after all tasks finished. bot input and output and add price estimation in $ according to price for opus 4.6 https://claude.com/pricing
2
+ - [ ] run project name generation claude instance not like this " execSync("claude --print ...")" but in a way it will not register session
3
+ - [ ] **Add** **support** **for** **full** **model** **IDs** like claude-opus-4-5-20251101 (in config)
@@ -0,0 +1,43 @@
1
+ # Outcome: Support Full Model IDs in Config
2
+
3
+ ## Summary
4
+
5
+ Added support for full Claude model IDs (e.g., `claude-opus-4-5-20251101`) in the RAF config system, alongside the existing short aliases (`sonnet`, `haiku`, `opus`).
6
+
7
+ ## Changes Made
8
+
9
+ ### `src/types/config.ts`
10
+ - Added `ClaudeModelAlias` type for short names (`sonnet | haiku | opus`)
11
+ - Widened `ClaudeModelName` to accept both short aliases and full model IDs via branded string intersection
12
+ - Added `FULL_MODEL_ID_PATTERN` regex: `/^claude-[a-z]+-\d+(-\d+)*$/`
13
+ - Added `VALID_MODEL_ALIASES` constant (replacing `VALID_MODELS` which is kept as deprecated alias)
14
+
15
+ ### `src/utils/config.ts`
16
+ - Added `isValidModelName()` function that checks against both short aliases and the full model ID regex
17
+ - Updated model validation in `validateConfig()` to use `isValidModelName()`
18
+ - Updated error message to mention both short aliases and full model ID format
19
+
20
+ ### `src/utils/validation.ts`
21
+ - Updated `validateModelName()` to accept full model IDs
22
+ - Updated `resolveModelOption()` return type to `ClaudeModelName`
23
+ - Updated error message to include full model ID example
24
+ - Deprecated `ValidModelName` type alias in favor of `ClaudeModelName`
25
+
26
+ ### `src/prompts/config-docs.md`
27
+ - Updated model value description to mention full model IDs
28
+ - Updated validation rules to describe the new pattern
29
+ - Added "Pinned Model Versions" example config section
30
+
31
+ ### `tests/unit/config.test.ts`
32
+ - Added `isValidModelName` tests: short aliases, full model IDs, and invalid strings
33
+ - Added `validateConfig` tests for full model IDs (with and without date suffix)
34
+ - Added `validateConfig` test for rejecting random strings
35
+ - Added `resolveConfig` test for deep-merging full model ID overrides
36
+
37
+ ## Verification
38
+
39
+ - TypeScript build passes cleanly
40
+ - All 1066 tests pass (7 new tests added)
41
+ - 2 pre-existing test failures confirmed unrelated (same on base branch)
42
+
43
+ <promise>COMPLETE</promise>
@@ -0,0 +1,33 @@
1
+ # Outcome: Fix Name Generation to Not Register Sessions
2
+
3
+ ## Summary
4
+
5
+ Replaced `execSync` with `spawn` in name generation and added `--no-session-persistence` flag to prevent throwaway name generation calls from cluttering the user's Claude session history.
6
+
7
+ ## Changes Made
8
+
9
+ ### `src/utils/name-generator.ts`
10
+ - Replaced `import { execSync }` with `import { spawn }` from `node:child_process`
11
+ - Added `runClaudePrint()` helper function that uses `spawn` with `--no-session-persistence` and `-p` flags
12
+ - Simplified `callSonnetForName()` and `callSonnetForMultipleNames()` to use `runClaudePrint()`
13
+ - Removed `escapeShellArg()` function (no longer needed - spawn passes args as array, not shell string)
14
+ - Removed `escapeShellArg` from exports
15
+ - Kept same 30-second timeout behavior
16
+ - Kept same fallback behavior on failure (returns null for single name, empty array for multiple)
17
+
18
+ ### `tests/unit/name-generator.test.ts`
19
+ - Replaced `execSync` mock with `spawn` mock using EventEmitter-based fake ChildProcess
20
+ - Added `createMockSpawn()` helper to create mock spawn return values
21
+ - Added test for `--no-session-persistence` flag presence
22
+ - Added test for spawn `error` event handling (e.g., ENOENT)
23
+ - Removed `escapeShellArg` tests (function removed)
24
+ - Updated model expectation from `sonnet` to `haiku` (actual default for nameGeneration)
25
+ - All 29 tests pass
26
+
27
+ ## Verification
28
+
29
+ - TypeScript build passes cleanly
30
+ - All 29 name-generator tests pass
31
+ - Full test suite: 1065 passed, 1 failed (pre-existing, unrelated to this change)
32
+
33
+ <promise>COMPLETE</promise>
@@ -0,0 +1,48 @@
1
+ # Outcome: Unify Task Execution to Stream-JSON Format
2
+
3
+ ## Summary
4
+
5
+ Unified both `run()` and `runVerbose()` methods in `ClaudeRunner` to use `--output-format stream-json --verbose` internally, enabling token usage data extraction from every task execution. The difference between verbose and non-verbose is now purely a display concern.
6
+
7
+ ## Changes Made
8
+
9
+ ### `src/types/config.ts`
10
+ - Added `ModelTokenUsage` interface (per-model token breakdown)
11
+ - Added `UsageData` interface (aggregate + per-model token usage)
12
+
13
+ ### `src/parsers/stream-renderer.ts`
14
+ - Added `UsageData` and `ModelTokenUsage` imports from types
15
+ - Extended `StreamEvent` interface with `usage` and `modelUsage` fields
16
+ - Added `usageData?: UsageData` field to `RenderResult` interface
17
+ - Updated `renderResult()` to extract usage data from result events
18
+ - Added `extractUsageData()` helper function
19
+
20
+ ### `src/core/claude-runner.ts`
21
+ - Added `UsageData` import from types
22
+ - Added `usageData?: UsageData` field to `RunResult` interface
23
+ - Replaced separate `run()` and `runVerbose()` implementations with a unified `_runStreamJson()` private method
24
+ - `run()` now delegates to `_runStreamJson(prompt, options, false)` — suppresses display
25
+ - `runVerbose()` now delegates to `_runStreamJson(prompt, options, true)` — shows display
26
+ - Both methods now use `--output-format stream-json --verbose` flags
27
+ - Both methods capture `usageData` from the stream-json result event
28
+
29
+ ### `tests/unit/stream-renderer.test.ts`
30
+ - Added 4 new tests for usage data extraction from result events
31
+ - Tests cover: full usage data, missing usage, partial usage, multi-model usage
32
+
33
+ ### `tests/unit/claude-runner.test.ts`
34
+ - Updated test asserting `run()` does NOT have stream-json flags → now asserts it DOES
35
+ - Added trailing newlines to context overflow test data (needed for NDJSON line buffering)
36
+ - Added 4 new tests in "usage data extraction" describe block:
37
+ - `run()` returns usageData from result events
38
+ - `runVerbose()` returns usageData from result events
39
+ - undefined usageData when no result event
40
+ - `run()` suppresses display but still captures usage data
41
+
42
+ ## Verification
43
+
44
+ - TypeScript build passes cleanly
45
+ - All 1073 tests pass (8 new tests added)
46
+ - 1 pre-existing test failure confirmed unrelated (same on base branch)
47
+
48
+ <promise>COMPLETE</promise>
@@ -0,0 +1,53 @@
1
+ # Outcome: Add Token Tracking and Cost Calculation
2
+
3
+ ## Summary
4
+
5
+ Implemented token usage accumulation across tasks and cost calculation using configurable per-model pricing. Added pricing config to the RAF config schema, a `TokenTracker` utility class, and model ID to pricing category mapping.
6
+
7
+ ## Changes Made
8
+
9
+ ### `src/types/config.ts`
10
+ - Added `PricingCategory` type (`'opus' | 'sonnet' | 'haiku'`)
11
+ - Added `ModelPricing` interface (inputPerMTok, outputPerMTok, cacheReadPerMTok, cacheCreatePerMTok)
12
+ - Added `PricingConfig` interface (per-category pricing)
13
+ - Added `pricing` field to `RafConfig` interface
14
+ - Added default pricing to `DEFAULT_CONFIG`:
15
+ - Opus: $15/$75 input/output, $1.50 cache read, $18.75 cache create
16
+ - Sonnet: $3/$15 input/output, $0.30 cache read, $3.75 cache create
17
+ - Haiku: $1/$5 input/output, $0.10 cache read, $1.25 cache create
18
+
19
+ ### `src/utils/config.ts`
20
+ - Added `pricing` to `VALID_TOP_LEVEL_KEYS` and validation sets
21
+ - Added pricing validation in `validateConfig()`: validates categories, fields, and values
22
+ - Added pricing deep-merge in `deepMerge()` (per-category field-level merging)
23
+ - Added `resolveModelPricingCategory()`: maps full model IDs (e.g., `claude-opus-4-6`) and short aliases to pricing categories
24
+ - Added `getPricing(category)` and `getPricingConfig()` accessor helpers
25
+
26
+ ### `src/utils/token-tracker.ts` (new file)
27
+ - `TokenTracker` class that accumulates `UsageData` across task executions
28
+ - `addTask(taskId, usage)`: records a task's usage and calculates per-task cost
29
+ - `getTotals()`: returns accumulated usage and cost across all tasks
30
+ - `calculateCost(usage)`: calculates cost using per-model pricing from `modelUsage` breakdown
31
+ - Falls back to sonnet pricing when model breakdown is unavailable or model family is unknown
32
+ - Exports `CostBreakdown` and `TaskUsageEntry` interfaces
33
+
34
+ ### `src/prompts/config-docs.md`
35
+ - Added `pricing` section documenting all fields, defaults, and example override
36
+ - Updated validation rules to mention pricing constraints
37
+ - Updated "Full — All Settings Explicit" example to include pricing
38
+
39
+ ### `tests/unit/token-tracker.test.ts` (new file)
40
+ - 14 tests covering: per-model cost calculation (opus/sonnet/haiku), multi-model usage, cache token pricing, fallback behavior, accumulation across tasks, custom pricing, zero tokens, per-task entries
41
+
42
+ ### `tests/unit/config.test.ts`
43
+ - Added 10 pricing validation tests (valid, partial, invalid types, unknown keys, negative values, Infinity)
44
+ - Added 3 `resolveModelPricingCategory` tests (short aliases, full IDs, unknown families)
45
+ - Added 2 `resolveConfig` pricing tests (defaults, deep-merge partial override)
46
+
47
+ ## Verification
48
+
49
+ - TypeScript build passes cleanly
50
+ - All 1103 tests pass (15 new tests added)
51
+ - 1 pre-existing test failure confirmed unrelated (same on base branch)
52
+
53
+ <promise>COMPLETE</promise>
@@ -0,0 +1,57 @@
1
+ # Outcome: Add Token/Cost Reporting to Console Output
2
+
3
+ ## Summary
4
+
5
+ Wired token usage tracking into the `raf do` execution flow, displaying per-task token summaries after each task and a grand total summary after all tasks complete.
6
+
7
+ ## Changes Made
8
+
9
+ ### `src/utils/terminal-symbols.ts`
10
+ - Added `formatNumber()`: formats numbers with thousands separators (e.g., `12,345`)
11
+ - Added `formatCost()`: formats USD cost with 2-4 decimal places (2 for >= $0.01, 4 for smaller)
12
+ - Added `formatTaskTokenSummary()`: per-task summary line showing input/output tokens, cache tokens, estimated cost
13
+ - Added `formatTokenTotalSummary()`: grand total block with dividers, total tokens, cache breakdown, estimated cost
14
+ - Added imports for `UsageData` and `CostBreakdown` types
15
+
16
+ ### `src/commands/do.ts`
17
+ - Imported `TokenTracker`, `formatTaskTokenSummary`, `formatTokenTotalSummary`
18
+ - Instantiated `TokenTracker` at the start of `executeSingleProject()`
19
+ - Added `lastUsageData` variable to capture usage from the last retry attempt
20
+ - After successful tasks: tracks usage and displays per-task summary via `logger.dim()`
21
+ - After failed tasks: tracks partial usage data and displays per-task summary
22
+ - After all tasks: displays grand total summary block if any tasks reported usage
23
+ - Tasks with no usage data (timeout, crash, no result event) are silently skipped
24
+
25
+ ### `CLAUDE.md`
26
+ - Added "Token Usage Tracking" section documenting the feature, key files, and formatting utilities
27
+
28
+ ### `tests/unit/terminal-symbols.test.ts`
29
+ - Added 20 new tests covering:
30
+ - `formatNumber`: small numbers, thousands separators, large numbers, zero
31
+ - `formatCost`: zero, normal costs, small costs, threshold boundary
32
+ - `formatTaskTokenSummary`: no cache, cache read only, cache create only, both caches, small costs
33
+ - `formatTokenTotalSummary`: no cache, cache read, cache create, both caches, divider lines
34
+
35
+ ## Output Examples
36
+
37
+ Per-task (displayed in dim text after each task):
38
+ ```
39
+ Tokens: 5,234 in / 1,023 out | Cache: 18,500 read | Est. cost: $0.42
40
+ ```
41
+
42
+ Grand total (displayed after all tasks):
43
+ ```
44
+ ── Token Usage Summary ──────────────────
45
+ Total tokens: 45,678 in / 12,345 out
46
+ Cache: 125,000 read / 8,000 created
47
+ Estimated cost: $3.75
48
+ ─────────────────────────────────────────
49
+ ```
50
+
51
+ ## Verification
52
+
53
+ - TypeScript build passes cleanly
54
+ - All 1122 tests pass (20 new tests added)
55
+ - 1 pre-existing test failure confirmed unrelated (same on base branch)
56
+
57
+ <promise>COMPLETE</promise>
@@ -0,0 +1,53 @@
1
+ # Outcome: Add Runtime Verbose Toggle During Task Execution
2
+
3
+ ## Summary
4
+
5
+ Added a runtime verbose toggle that lets users press Tab during task execution to show or hide tool-use activity lines in real-time. The toggle works across sequential tasks, properly handles Ctrl+C for graceful shutdown, and automatically skips setup when stdin is not a TTY.
6
+
7
+ ## Changes Made
8
+
9
+ ### `src/utils/verbose-toggle.ts` (new file)
10
+ - `VerboseToggle` class that manages stdin raw mode and Tab keypress listening
11
+ - `start()`: sets stdin to raw mode, listens for Tab (0x09) to flip verbose state
12
+ - `stop()`: restores stdin to normal mode and removes the listener
13
+ - Ctrl+C (0x03) is re-emitted as `SIGINT` so the shutdown handler still works
14
+ - Shows `[verbose: on]` / `[verbose: off]` indicator on toggle
15
+ - Shows "Press Tab to toggle verbose mode" hint on start
16
+ - No-op when stdin is not a TTY (piped input)
17
+ - Safe to call `start()`/`stop()` multiple times
18
+
19
+ ### `src/core/claude-runner.ts`
20
+ - Added `verboseCheck?: () => boolean` option to `ClaudeRunnerOptions`
21
+ - Updated `_runStreamJson()` to use `verboseCheck` callback when provided, falling back to static `verbose` parameter
22
+ - Both the main stdout handler and the remaining-buffer handler on `close` use `shouldDisplay()` callback
23
+
24
+ ### `src/commands/do.ts`
25
+ - Imported `VerboseToggle`
26
+ - Creates `VerboseToggle` with initial state matching `--verbose` flag
27
+ - Registers `verboseToggle.stop()` as shutdown cleanup callback
28
+ - Starts toggle listener before the task execution loop
29
+ - Passes `verboseCheck: () => verboseToggle.isVerbose` to runner options
30
+ - Stops toggle listener after the task loop completes (before summary output)
31
+
32
+ ### `tests/unit/verbose-toggle.test.ts` (new file)
33
+ - 15 tests covering:
34
+ - Initial state, active state, TTY/non-TTY behavior
35
+ - Tab keypress toggling (on/off)
36
+ - Ctrl+C re-emitting SIGINT
37
+ - Ignoring non-Tab keypresses
38
+ - Multiple bytes in single data event
39
+ - Stop/start lifecycle, double-call safety
40
+ - No response after stop
41
+ - Cross-task persistence (stop and restart)
42
+ - setRawMode error handling
43
+
44
+ ### `tests/unit/claude-runner.test.ts`
45
+ - Added 1 test: `verboseCheck` callback dynamically controls display output
46
+
47
+ ## Verification
48
+
49
+ - TypeScript build passes cleanly
50
+ - All 1138 tests pass (16 new tests added)
51
+ - 1 pre-existing test failure confirmed unrelated (same on base branch)
52
+
53
+ <promise>COMPLETE</promise>
@@ -0,0 +1,36 @@
1
+ # Outcome: Document raf config in README and Strengthen README Update Policy
2
+
3
+ ## Summary
4
+
5
+ Added `raf config` documentation to README.md and expanded CLAUDE.md's "Important Reminders" with explicit guidance on when to update the README.
6
+
7
+ ## Changes Made
8
+
9
+ ### `README.md`
10
+ - Added "Configurable" bullet to the Features list
11
+ - Added `### raf config` section after `raf status` in the Commands section, including:
12
+ - Command usage examples (interactive, with prompt, --reset)
13
+ - Precedence rule (CLI flags > config file > defaults)
14
+ - Minimal config file example with 3 common settings (models, worktree, timeout)
15
+ - Note directing users to the interactive session for full config reference
16
+ - Added `### raf config [prompt]` to the Command Reference section with `--reset` option
17
+
18
+ ### `CLAUDE.md`
19
+ - Expanded "Important Reminders" section with explicit README update policy:
20
+ - When new CLI commands are added
21
+ - When existing command flags or behavior change
22
+ - When important features are added
23
+ - When the Features list needs updating
24
+ - Separated README and CLAUDE.md update guidance into distinct items
25
+
26
+ ## Verification
27
+
28
+ - All acceptance criteria met:
29
+ - README has a `raf config` section with usage and basic example
30
+ - `raf config` appears in the Command Reference table
31
+ - Config file location (`~/.raf/raf.config.json`) and precedence rules are mentioned
32
+ - CLAUDE.md has explicit guidance about when to update README
33
+ - No existing README content was broken or removed
34
+ - Documentation tone and style match the rest of the README
35
+
36
+ <promise>COMPLETE</promise>
@@ -0,0 +1,35 @@
1
+ # Task: Support Full Model IDs in Config
2
+
3
+ ## Objective
4
+ Allow users to specify full model IDs (e.g., `claude-opus-4-5-20251101`) in addition to short aliases (`opus`, `sonnet`, `haiku`) in the RAF config system.
5
+
6
+ ## Context
7
+ Currently, `ClaudeModelName` type only accepts `'sonnet' | 'haiku' | 'opus'`. The Claude CLI `--model` flag already accepts both short aliases and full model IDs, but RAF's config validation rejects full IDs. Users need this to pin specific model versions.
8
+
9
+ ## Requirements
10
+ - Keep short names (`sonnet`, `haiku`, `opus`) working as before — they remain the default
11
+ - Accept full model IDs matching the pattern `claude-{family}-{version}` (e.g., `claude-opus-4-5-20251101`, `claude-sonnet-4-5-20250929`)
12
+ - Validate full model IDs with a regex pattern (not just accept any string)
13
+ - Default config values stay as short names
14
+ - Update all types, validation, config docs, and tests
15
+
16
+ ## Implementation Steps
17
+ 1. Widen `ClaudeModelName` type to be a union of the short aliases and a branded/validated string type for full IDs
18
+ 2. Add a regex pattern constant for validating full model IDs (e.g., `/^claude-[a-z]+-[\d]+-[\d]+(-\d+)?$/` or similar — examine real model ID formats to get the pattern right)
19
+ 3. Update `VALID_MODELS` and the validation logic in `src/utils/config.ts` to accept both short names and full IDs matching the regex
20
+ 4. Update `PlanCommandOptions` and `DoCommandOptions` model types to accept full IDs
21
+ 5. Update the config documentation in `src/prompts/config-docs.md` to mention full model ID support
22
+ 6. Add tests for validation: valid short names, valid full IDs, invalid strings
23
+
24
+ ## Acceptance Criteria
25
+ - [ ] Short model names (`sonnet`, `haiku`, `opus`) continue to work in config
26
+ - [ ] Full model IDs like `claude-opus-4-5-20251101` are accepted in config
27
+ - [ ] Invalid model strings (e.g., `gpt-4`, `random-string`) are rejected with a clear error
28
+ - [ ] Config docs updated to reflect new capabilities
29
+ - [ ] All existing tests pass
30
+ - [ ] New tests cover full model ID validation
31
+
32
+ ## Notes
33
+ - The Claude CLI `--model` flag example from docs: `claude --model claude-sonnet-4-5-20250929`
34
+ - Keep the type system helpful — IDE autocomplete should still suggest short names
35
+ - The regex should be permissive enough to handle future model naming patterns but strict enough to catch obvious typos
@@ -0,0 +1,36 @@
1
+ # Task: Fix Name Generation to Not Register Sessions
2
+
3
+ ## Objective
4
+ Replace `execSync("claude --print ...")` in name generation with a spawn-based approach using `--no-session-persistence` to prevent sessions from being saved to disk.
5
+
6
+ ## Context
7
+ Project name generation (`src/utils/name-generator.ts`) currently uses `execSync()` to call `claude --model X --print "prompt"`. This registers a session in Claude's session history, cluttering the user's session list with throwaway name generation calls. The `--no-session-persistence` CLI flag prevents this.
8
+
9
+ ## Requirements
10
+ - Replace `execSync` with `spawn` from `child_process` (same approach as `claude-runner.ts`)
11
+ - Use `-p` flag for non-interactive print mode
12
+ - Add `--no-session-persistence` flag to prevent session registration
13
+ - Do NOT use `--dangerously-skip-permissions` (not needed for simple text generation)
14
+ - Keep the same timeout behavior (30 seconds)
15
+ - Keep the same fallback behavior on failure
16
+ - The functions are already `async` so switching to spawn is natural
17
+
18
+ ## Implementation Steps
19
+ 1. In `src/utils/name-generator.ts`, replace `execSync` calls in `callSonnetForName()` and `callSonnetForMultipleNames()` with async `spawn`-based execution
20
+ 2. Add `--no-session-persistence` to the CLI arguments
21
+ 3. Collect stdout from the spawned process and return it as the result
22
+ 4. Handle errors and timeouts the same way as before (return null/empty on failure)
23
+ 5. Update tests to reflect the new spawn-based approach
24
+
25
+ ## Acceptance Criteria
26
+ - [ ] Name generation no longer uses `execSync`
27
+ - [ ] `--no-session-persistence` flag is passed to Claude CLI
28
+ - [ ] Sessions from name generation do not appear in `claude --resume` picker
29
+ - [ ] Name generation still works correctly (produces valid kebab-case names)
30
+ - [ ] Fallback to generated names works when Claude CLI fails
31
+ - [ ] All existing tests pass
32
+
33
+ ## Notes
34
+ - The `--no-session-persistence` flag is documented in Claude CLI reference: "Disable session persistence so sessions are not saved to disk and cannot be resumed (print mode only)"
35
+ - Consider using a small helper function for spawn-based CLI calls since this pattern could be reused
36
+ - Current `escapeShellArg()` won't be needed with spawn (arguments passed as array, not shell string)
@@ -0,0 +1,44 @@
1
+ # Task: Unify Task Execution to Stream-JSON Format
2
+
3
+ ## Objective
4
+ Switch all task execution modes to use `--output-format stream-json --verbose` so that token usage data is always available from the `result` event.
5
+
6
+ ## Context
7
+ Currently `claude-runner.ts` has two execution methods: `run()` (plain text stdout) and `runVerbose()` (stream-json with tool display). Token usage data is only available in stream-json's `result` event. To enable token tracking, all execution must use stream-json format. The difference between verbose and non-verbose becomes purely a display concern — whether tool descriptions are printed to the console.
8
+
9
+ ## Dependencies
10
+ 01
11
+
12
+ ## Requirements
13
+ - All task execution uses `--output-format stream-json --verbose` flags
14
+ - Non-verbose mode still suppresses tool-use display lines but still captures the stream-json data
15
+ - The `result` event from stream-json is always parsed and returned
16
+ - The return type of execution methods is updated to include usage data extracted from the result event
17
+ - Completion detection logic continues to work (parsing text content for markers)
18
+ - No change to interactive mode (`runInteractive()` with node-pty)
19
+
20
+ ## Implementation Steps
21
+ 1. Define a `UsageData` interface/type to represent token usage extracted from the stream-json result event (input tokens, output tokens, cache read tokens, cache creation tokens, per-model breakdown)
22
+ 2. Update the `run()` method to use stream-json format internally, parse NDJSON lines, extract text content for completion detection, and capture the `result` event
23
+ 3. Add a `verbose` display option that controls whether tool-use lines are printed to stdout (true = show tools, false = suppress)
24
+ 4. Extract usage data from the `result` event's `usage` and `modelUsage` fields
25
+ 5. Update the return type of both `run()` and `runVerbose()` (or unified method) to include `UsageData`
26
+ 6. Update callers in `src/commands/do.ts` to receive and pass through usage data
27
+ 7. Update tests for the new execution flow
28
+
29
+ ## Acceptance Criteria
30
+ - [ ] `run()` and `runVerbose()` both return token usage data
31
+ - [ ] Non-verbose execution suppresses tool display but still gets usage data
32
+ - [ ] Verbose execution shows tool display as before AND returns usage data
33
+ - [ ] Completion detection (COMPLETE/FAILED markers) still works
34
+ - [ ] Context overflow detection still works
35
+ - [ ] Timeout behavior unchanged
36
+ - [ ] All existing tests pass
37
+
38
+ ## Notes
39
+ - The stream-json result event structure (from actual CLI output):
40
+ ```
41
+ {"type":"result","usage":{"input_tokens":N,"output_tokens":N,"cache_creation_input_tokens":N,"cache_read_input_tokens":N},"modelUsage":{"claude-opus-4-6":{"inputTokens":N,"outputTokens":N,...}}}
42
+ ```
43
+ - Consider merging `run()` and `runVerbose()` into a single method with a `verbose` option to reduce code duplication
44
+ - The `renderStreamEvent()` function in `src/parsers/stream-renderer.ts` already handles parsing — extend it to also return usage data when a `result` event is encountered