@mariozechner/pi-coding-agent 0.23.2 → 0.23.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +40 -0
- package/README.md +89 -148
- package/dist/core/agent-session.d.ts +3 -1
- package/dist/core/agent-session.d.ts.map +1 -1
- package/dist/core/agent-session.js +27 -6
- package/dist/core/agent-session.js.map +1 -1
- package/dist/core/custom-tools/loader.d.ts.map +1 -1
- package/dist/core/custom-tools/loader.js +4 -2
- package/dist/core/custom-tools/loader.js.map +1 -1
- package/dist/core/hooks/loader.d.ts.map +1 -1
- package/dist/core/hooks/loader.js +29 -3
- package/dist/core/hooks/loader.js.map +1 -1
- package/dist/core/hooks/types.d.ts +2 -1
- package/dist/core/hooks/types.d.ts.map +1 -1
- package/dist/core/hooks/types.js.map +1 -1
- package/dist/core/system-prompt.d.ts.map +1 -1
- package/dist/core/system-prompt.js +3 -3
- package/dist/core/system-prompt.js.map +1 -1
- package/dist/modes/interactive/components/diff.d.ts +12 -0
- package/dist/modes/interactive/components/diff.d.ts.map +1 -0
- package/dist/modes/interactive/components/diff.js +133 -0
- package/dist/modes/interactive/components/diff.js.map +1 -0
- package/dist/modes/interactive/components/tool-execution.d.ts.map +1 -1
- package/dist/modes/interactive/components/tool-execution.js +26 -20
- package/dist/modes/interactive/components/tool-execution.js.map +1 -1
- package/dist/modes/interactive/theme/dark.json +9 -9
- package/dist/modes/interactive/theme/light.json +9 -9
- package/dist/modes/interactive/theme/theme.d.ts +10 -0
- package/dist/modes/interactive/theme/theme.d.ts.map +1 -1
- package/dist/modes/interactive/theme/theme.js +131 -3
- package/dist/modes/interactive/theme/theme.js.map +1 -1
- package/docs/custom-tools.md +19 -1
- package/docs/hooks.md +39 -19
- package/docs/rpc.md +14 -0
- package/docs/skills.md +148 -52
- package/docs/theme.md +23 -21
- package/package.json +6 -6
- package/docs/compaction-new.md +0 -387
- package/docs/compaction-strategies.ts +0 -502
- package/docs/compaction.md +0 -519
- package/docs/gemini.md +0 -255
- package/docs/truncation.md +0 -235
- package/docs/undercompaction.md +0 -313
package/docs/compaction.md
DELETED
|
@@ -1,519 +0,0 @@
|
|
|
1
|
-
# Context Compaction
|
|
2
|
-
|
|
3
|
-
Research on how other coding assistants implement context compaction to manage long conversations.
|
|
4
|
-
|
|
5
|
-
## Overview
|
|
6
|
-
|
|
7
|
-
Context compaction (also called "handoff" or "summarization") is a technique to manage the context window in long coding sessions. When conversations grow too long, performance degrades and costs increase. Compaction summarizes the conversation history into a condensed form, allowing work to continue without hitting context limits.
|
|
8
|
-
|
|
9
|
-
## Claude Code
|
|
10
|
-
|
|
11
|
-
**Manual:** `/compact` command
|
|
12
|
-
**Auto:** Triggers at ~95% context capacity ([source](https://stevekinney.com/courses/ai-development/claude-code-compaction))
|
|
13
|
-
|
|
14
|
-
### How it works
|
|
15
|
-
|
|
16
|
-
1. Takes entire conversation history
|
|
17
|
-
2. Uses an LLM to generate a summary
|
|
18
|
-
3. Starts a new session with the summary as initial context
|
|
19
|
-
4. User can provide custom instructions with `/compact` (e.g., "summarize only the TODOs") ([source](https://stevekinney.com/courses/ai-development/claude-code-compaction))
|
|
20
|
-
|
|
21
|
-
### Prompt (extracted from community)
|
|
22
|
-
|
|
23
|
-
From [r/ClaudeAI](https://www.reddit.com/r/ClaudeAI/comments/1jr52qj/here_is_claude_codes_compact_prompt/):
|
|
24
|
-
|
|
25
|
-
```
|
|
26
|
-
Your task is to create a detailed summary of the conversation so far, paying close attention to the user's explicit requests and your previous actions. This summary will be used as context when continuing the conversation, so preserve critical information including:
|
|
27
|
-
- What was accomplished
|
|
28
|
-
- Current work in progress
|
|
29
|
-
- Files involved
|
|
30
|
-
- Next steps
|
|
31
|
-
- Key user requests or constraints
|
|
32
|
-
```
|
|
33
|
-
|
|
34
|
-
### Key observations
|
|
35
|
-
|
|
36
|
-
- Auto-compact triggers at ~95% capacity but users often recommend manual compaction earlier ([source](https://stevekinney.com/courses/ai-development/claude-code-compaction))
|
|
37
|
-
- Quality can degrade with multiple compactions (cumulative information loss) ([source](https://stevekinney.com/courses/ai-development/claude-code-compaction))
|
|
38
|
-
- Different from `/clear` which wipes history completely ([source](https://stevekinney.com/courses/ai-development/claude-code-compaction))
|
|
39
|
-
- Users report the model can "go off the rails" if auto-compact happens mid-task ([source](https://stevekinney.com/courses/ai-development/claude-code-compaction))
|
|
40
|
-
|
|
41
|
-
## OpenAI Codex CLI
|
|
42
|
-
|
|
43
|
-
Source: [github.com/openai/codex](https://github.com/openai/codex) (codex-rs/core/src/compact.rs, codex-rs/core/templates/compact/)
|
|
44
|
-
|
|
45
|
-
**Manual:** `/compact` slash command
|
|
46
|
-
**Auto:** Triggers when token usage exceeds `model_auto_compact_token_limit`
|
|
47
|
-
|
|
48
|
-
### How it works
|
|
49
|
-
|
|
50
|
-
1. Uses a dedicated summarization prompt
|
|
51
|
-
2. Sends entire history with the prompt appended
|
|
52
|
-
3. Collects the summary from the model response
|
|
53
|
-
4. Builds new history: initial context + recent user messages (up to 20k tokens) + summary
|
|
54
|
-
5. Replaces session history with the compacted version
|
|
55
|
-
|
|
56
|
-
### Prompt
|
|
57
|
-
|
|
58
|
-
From [codex-rs/core/templates/compact/prompt.md](https://github.com/openai/codex/blob/main/codex-rs/core/templates/compact/prompt.md):
|
|
59
|
-
|
|
60
|
-
```markdown
|
|
61
|
-
You are performing a CONTEXT CHECKPOINT COMPACTION. Create a handoff summary for another LLM that will resume the task.
|
|
62
|
-
|
|
63
|
-
Include:
|
|
64
|
-
- Current progress and key decisions made
|
|
65
|
-
- Important context, constraints, or user preferences
|
|
66
|
-
- What remains to be done (clear next steps)
|
|
67
|
-
- Any critical data, examples, or references needed to continue
|
|
68
|
-
|
|
69
|
-
Be concise, structured, and focused on helping the next LLM seamlessly continue the work.
|
|
70
|
-
```
|
|
71
|
-
|
|
72
|
-
### Summary prefix (prepended to summaries in new context)
|
|
73
|
-
|
|
74
|
-
From [codex-rs/core/templates/compact/summary_prefix.md](https://github.com/openai/codex/blob/main/codex-rs/core/templates/compact/summary_prefix.md):
|
|
75
|
-
|
|
76
|
-
```markdown
|
|
77
|
-
Another language model started to solve this problem and produced a summary of its thinking process. You also have access to the state of the tools that were used by that language model. Use this to build on the work that has already been done and avoid duplicating work. Here is the summary produced by the other language model, use the information in this summary to assist with your own analysis:
|
|
78
|
-
```
|
|
79
|
-
|
|
80
|
-
### Key observations
|
|
81
|
-
|
|
82
|
-
- Uses token-based threshold (`model_auto_compact_token_limit`) rather than percentage ([config/mod.rs](https://github.com/openai/codex/blob/main/codex-rs/core/src/config/mod.rs))
|
|
83
|
-
- Default thresholds vary by model (e.g., 180k for some models, 244k for others) ([config/mod.rs](https://github.com/openai/codex/blob/main/codex-rs/core/src/config/mod.rs))
|
|
84
|
-
- Preserves recent user messages (last ~20k tokens worth) alongside summary ([compact.rs](https://github.com/openai/codex/blob/main/codex-rs/core/src/compact.rs))
|
|
85
|
-
- Warns user: "Long conversations and multiple compactions can cause the model to be less accurate" ([compact.rs](https://github.com/openai/codex/blob/main/codex-rs/core/src/compact.rs))
|
|
86
|
-
- Has retry logic with exponential backoff for failed compactions ([compact.rs](https://github.com/openai/codex/blob/main/codex-rs/core/src/compact.rs))
|
|
87
|
-
- Uses "effective_context_window_percent" of 95% for safety margin ([model_family.rs](https://github.com/openai/codex/blob/main/codex-rs/core/src/model_family.rs))
|
|
88
|
-
|
|
89
|
-
## OpenCode (sst/opencode)
|
|
90
|
-
|
|
91
|
-
Source: [github.com/sst/opencode](https://github.com/sst/opencode) (packages/opencode/src/session/compaction.ts)
|
|
92
|
-
|
|
93
|
-
**Manual:** `/compact` command
|
|
94
|
-
**Auto:** Triggers when `isOverflow()` returns true (based on token usage vs model limits)
|
|
95
|
-
|
|
96
|
-
### How it works
|
|
97
|
-
|
|
98
|
-
1. Checks if tokens exceed (context_limit - output_limit) ([compaction.ts](https://github.com/sst/opencode/blob/main/packages/opencode/src/session/compaction.ts))
|
|
99
|
-
2. Creates a new assistant message marked as "summary"
|
|
100
|
-
3. Uses a compaction system prompt
|
|
101
|
-
4. Streams the summary generation
|
|
102
|
-
5. If auto-compaction, adds a "Continue if you have next steps" message
|
|
103
|
-
|
|
104
|
-
### Prompt
|
|
105
|
-
|
|
106
|
-
From [packages/opencode/src/session/prompt/compaction.txt](https://github.com/sst/opencode/blob/main/packages/opencode/src/session/prompt/compaction.txt):
|
|
107
|
-
|
|
108
|
-
```
|
|
109
|
-
You are a helpful AI assistant tasked with summarizing conversations.
|
|
110
|
-
|
|
111
|
-
When asked to summarize, provide a detailed but concise summary of the conversation.
|
|
112
|
-
Focus on information that would be helpful for continuing the conversation, including:
|
|
113
|
-
- What was done
|
|
114
|
-
- What is currently being worked on
|
|
115
|
-
- Which files are being modified
|
|
116
|
-
- What needs to be done next
|
|
117
|
-
- Key user requests, constraints, or preferences that should persist
|
|
118
|
-
- Important technical decisions and why they were made
|
|
119
|
-
|
|
120
|
-
Your summary should be comprehensive enough to provide context but concise enough to be quickly understood.
|
|
121
|
-
```
|
|
122
|
-
|
|
123
|
-
### Final user message
|
|
124
|
-
|
|
125
|
-
From [compaction.ts](https://github.com/sst/opencode/blob/main/packages/opencode/src/session/compaction.ts):
|
|
126
|
-
|
|
127
|
-
```
|
|
128
|
-
Summarize our conversation above. This summary will be the only context available when the conversation continues, so preserve critical information including: what was accomplished, current work in progress, files involved, next steps, and any key user requests or constraints. Be concise but detailed enough that work can continue seamlessly.
|
|
129
|
-
```
|
|
130
|
-
|
|
131
|
-
### Key observations
|
|
132
|
-
|
|
133
|
-
- Has a "prune" mechanism separate from compaction ([compaction.ts](https://github.com/sst/opencode/blob/main/packages/opencode/src/session/compaction.ts)):
|
|
134
|
-
- Scans backward through tool calls
|
|
135
|
-
- Protects last 40k tokens of tool output (PRUNE_PROTECT constant)
|
|
136
|
-
- Prunes tool outputs beyond that threshold if >20k tokens prunable (PRUNE_MINIMUM constant)
|
|
137
|
-
- Disables auto-compaction via `OPENCODE_DISABLE_AUTOCOMPACT` env var ([flag.ts](https://github.com/sst/opencode/blob/main/packages/opencode/src/flag/flag.ts))
|
|
138
|
-
- Separate summarization for UI display (2 sentences max) vs. compaction (detailed) ([summary.ts](https://github.com/sst/opencode/blob/main/packages/opencode/src/session/summary.ts))
|
|
139
|
-
|
|
140
|
-
## Amp (Sourcegraph)
|
|
141
|
-
|
|
142
|
-
Source: [ampcode.com/guides/context-management](https://ampcode.com/guides/context-management)
|
|
143
|
-
|
|
144
|
-
**Manual:** "Handoff" feature
|
|
145
|
-
**Auto:** None (manual context management encouraged)
|
|
146
|
-
|
|
147
|
-
### How it works
|
|
148
|
-
|
|
149
|
-
Amp takes a different approach, providing tools for manual context management rather than automatic compaction:
|
|
150
|
-
|
|
151
|
-
1. **Handoff**: Specify a goal for the next task, Amp analyzes the current thread and extracts relevant information into a new message for a fresh thread
|
|
152
|
-
2. **Fork**: Duplicate context window at a specific point
|
|
153
|
-
3. **Edit/Restore**: Edit or restore to previous messages
|
|
154
|
-
4. **Thread References**: Reference other threads to extract information on-demand
|
|
155
|
-
|
|
156
|
-
### Key observations
|
|
157
|
-
|
|
158
|
-
- Philosophy: "For best results, keep conversations short & focused" ([source](https://ampcode.com/guides/context-management))
|
|
159
|
-
- Emphasizes that everything in context affects output quality: "everything in the context window has an influence on the output" ([source](https://ampcode.com/guides/context-management))
|
|
160
|
-
- Uses a secondary model to extract relevant information during handoff ([source](https://ampcode.com/guides/context-management))
|
|
161
|
-
- Thread references allow selective extraction without full context inclusion ([source](https://ampcode.com/guides/context-management))
|
|
162
|
-
- No automatic compaction; relies on user discipline and tooling
|
|
163
|
-
|
|
164
|
-
## Implementation Recommendations for pi-coding-agent
|
|
165
|
-
|
|
166
|
-
### `/compact` Command
|
|
167
|
-
|
|
168
|
-
```typescript
|
|
169
|
-
// User triggers: /compact [optional custom instructions]
|
|
170
|
-
// 1. Generate summary using current conversation
|
|
171
|
-
// 2. Create new session with summary as initial context
|
|
172
|
-
// 3. Optionally continue with queued user message
|
|
173
|
-
```
|
|
174
|
-
|
|
175
|
-
### Auto-compaction
|
|
176
|
-
|
|
177
|
-
```typescript
|
|
178
|
-
// Threshold-based (e.g., 85-90% of context limit)
|
|
179
|
-
// Check after each turn:
|
|
180
|
-
if (tokenUsage / contextLimit > 0.85) {
|
|
181
|
-
await compact({ auto: true });
|
|
182
|
-
}
|
|
183
|
-
```
|
|
184
|
-
|
|
185
|
-
### Compaction Prompt
|
|
186
|
-
|
|
187
|
-
Based on research, a good compaction prompt should include:
|
|
188
|
-
|
|
189
|
-
```markdown
|
|
190
|
-
Create a detailed summary for continuing this coding session. Include:
|
|
191
|
-
|
|
192
|
-
1. **Completed work**: What tasks were finished
|
|
193
|
-
2. **Current state**: Files modified, their current status
|
|
194
|
-
3. **In progress**: What is being worked on now
|
|
195
|
-
4. **Next steps**: Clear actions to take
|
|
196
|
-
5. **Constraints**: User preferences, project requirements, key decisions made
|
|
197
|
-
6. **Critical context**: Any information essential for continuing
|
|
198
|
-
|
|
199
|
-
Be concise but preserve enough detail that work can continue seamlessly.
|
|
200
|
-
```
|
|
201
|
-
|
|
202
|
-
### Key Design Decisions
|
|
203
|
-
|
|
204
|
-
1. **Threshold**: 85-90% recommended (95% is often too late, per Claude Code user feedback)
|
|
205
|
-
2. **Pruning**: Consider pruning old tool outputs before full compaction (OpenCode approach)
|
|
206
|
-
3. **Warning**: Notify users that compaction happened and quality may degrade (Codex approach)
|
|
207
|
-
4. **Disable option**: Allow users to disable auto-compaction via flag/env (OpenCode approach)
|
|
208
|
-
5. **Custom instructions**: Support `/compact [instructions]` for targeted summaries (Claude Code approach)
|
|
209
|
-
6. **Session continuity**: New session should feel seamless (summary as hidden context)
|
|
210
|
-
|
|
211
|
-
### Existing Infrastructure
|
|
212
|
-
|
|
213
|
-
The coding-agent already has:
|
|
214
|
-
- `/clear` command that resets the session
|
|
215
|
-
- Session management with message history
|
|
216
|
-
- Token counting per turn
|
|
217
|
-
|
|
218
|
-
For compaction, we need to:
|
|
219
|
-
1. Add `/compact` command handler (similar to `/clear` but with summary)
|
|
220
|
-
2. Add token threshold checking after each assistant turn
|
|
221
|
-
3. Create a summarization prompt
|
|
222
|
-
4. Wire it to create a new session with the summary
|
|
223
|
-
|
|
224
|
-
---
|
|
225
|
-
|
|
226
|
-
## Our Implementation Plan
|
|
227
|
-
|
|
228
|
-
### Commands
|
|
229
|
-
|
|
230
|
-
- **`/compact [custom instructions]`** - Manual compaction trigger. Optional custom instructions let users guide what to focus on in the summary.
|
|
231
|
-
- **`/autocompact`** - Opens selector UI to toggle auto-compaction on/off. Also displays current power-user settings (reserveTokens, keepRecentTokens).
|
|
232
|
-
|
|
233
|
-
### Configuration
|
|
234
|
-
|
|
235
|
-
Settings stored in `~/.pi/agent/settings.json`:
|
|
236
|
-
|
|
237
|
-
```typescript
|
|
238
|
-
interface Settings {
|
|
239
|
-
// ... existing fields
|
|
240
|
-
compaction?: {
|
|
241
|
-
enabled?: boolean // default: true, toggled via /autocompact
|
|
242
|
-
reserveTokens?: number // default: 16384, power-user setting
|
|
243
|
-
keepRecentTokens?: number // default: 20000, power-user setting
|
|
244
|
-
}
|
|
245
|
-
}
|
|
246
|
-
```
|
|
247
|
-
|
|
248
|
-
**Why these defaults:**
|
|
249
|
-
- `reserveTokens: 16384` - Room for summary output (~13k) plus safety margin (~3k)
|
|
250
|
-
- `keepRecentTokens: 20000` - Preserves recent context verbatim, summary focuses on older content
|
|
251
|
-
|
|
252
|
-
### Token Calculation
|
|
253
|
-
|
|
254
|
-
Context tokens are calculated from the **last non-aborted assistant message** using the same formula as the footer:
|
|
255
|
-
|
|
256
|
-
```
|
|
257
|
-
contextTokens = usage.input + usage.output + usage.cacheRead + usage.cacheWrite
|
|
258
|
-
```
|
|
259
|
-
|
|
260
|
-
This gives total context size across all providers. The `input` field represents non-cached input tokens, so adding `cacheRead` and `cacheWrite` gives the true total input.
|
|
261
|
-
|
|
262
|
-
**Trigger condition:**
|
|
263
|
-
```typescript
|
|
264
|
-
if (contextTokens > model.contextWindow - settings.compaction.reserveTokens) {
|
|
265
|
-
await compact({ auto: true });
|
|
266
|
-
}
|
|
267
|
-
```
|
|
268
|
-
|
|
269
|
-
### Turn Boundaries
|
|
270
|
-
|
|
271
|
-
Messages follow patterns like: `user, assistant, toolResult, toolResult, user, assistant, ...`
|
|
272
|
-
|
|
273
|
-
**Critical rule:** Never cut mid-turn. A turn = user message → assistant responses + tool results until next user message. Always cut before a user message to keep assistant + toolResult pairs intact (providers fail if toolResult is orphaned from its assistant message with the toolCall).
|
|
274
|
-
|
|
275
|
-
### Summary Injection
|
|
276
|
-
|
|
277
|
-
The summary is injected as a **user message** with a prefix (similar to Codex approach). This makes it visible to the user and clearly frames it for the model.
|
|
278
|
-
|
|
279
|
-
Prefix:
|
|
280
|
-
```
|
|
281
|
-
Another language model worked on this task and produced a summary. Use this to continue the work without duplicating effort:
|
|
282
|
-
```
|
|
283
|
-
|
|
284
|
-
### Session File Format
|
|
285
|
-
|
|
286
|
-
Compaction events are **appended** to the session file (never inserted mid-file):
|
|
287
|
-
|
|
288
|
-
```typescript
|
|
289
|
-
interface CompactionEvent {
|
|
290
|
-
type: "compaction"
|
|
291
|
-
timestamp: string
|
|
292
|
-
summary: string // The summary text
|
|
293
|
-
keepLastMessages: number // How many messages before this event to keep
|
|
294
|
-
tokensBefore: number // Context size before compaction
|
|
295
|
-
}
|
|
296
|
-
```
|
|
297
|
-
|
|
298
|
-
Example session file after compaction:
|
|
299
|
-
```
|
|
300
|
-
{"type": "message", "message": {"role": "user", ...}}
|
|
301
|
-
{"type": "message", "message": {"role": "assistant", ...}}
|
|
302
|
-
{"type": "message", "message": {"role": "toolResult", ...}}
|
|
303
|
-
... more messages ...
|
|
304
|
-
{"type": "compaction", "summary": "...", "keepLastMessages": 4, ...}
|
|
305
|
-
{"type": "message", "message": {"role": "user", ...}} <- new messages after compaction
|
|
306
|
-
```
|
|
307
|
-
|
|
308
|
-
**Session loader behavior:**
|
|
309
|
-
1. Find the latest compaction event
|
|
310
|
-
2. Take last `keepLastMessages` messages *before* the compaction event
|
|
311
|
-
3. Build context: `[summary_as_user_msg, ...kept_messages, ...messages_after_compaction]`
|
|
312
|
-
|
|
313
|
-
**Multiple compactions:** When doing a second compaction, don't cross the first compaction boundary. The new summary incorporates the previous summary (since current context already includes it).
|
|
314
|
-
|
|
315
|
-
#### Example: Single Compaction
|
|
316
|
-
|
|
317
|
-
Session file with messages (u=user, a=assistant, t=toolResult):
|
|
318
|
-
```
|
|
319
|
-
u1, a1, t1, t1, a1, u2, a2, u3, a3, t3, a3, t3, a3, u4, a4, t4, a4
|
|
320
|
-
```
|
|
321
|
-
|
|
322
|
-
Compaction triggers, keeping last 4 messages. The compaction event is appended:
|
|
323
|
-
```
|
|
324
|
-
u1, a1, t1, t1, a1, u2, a2, u3, a3, t3, a3, t3, a3, u4, a4, t4, a4
|
|
325
|
-
[COMPACTION: summary="...", keepLastMessages=4]
|
|
326
|
-
```
|
|
327
|
-
|
|
328
|
-
Session loader builds context:
|
|
329
|
-
```
|
|
330
|
-
[summary_as_user_msg], u4, a4, t4, a4
|
|
331
|
-
```
|
|
332
|
-
|
|
333
|
-
New messages after compaction are appended:
|
|
334
|
-
```
|
|
335
|
-
u1, a1, t1, t1, a1, u2, a2, u3, a3, t3, a3, t3, a3, u4, a4, t4, a4
|
|
336
|
-
[COMPACTION: summary="...", keepLastMessages=4]
|
|
337
|
-
u5, a5
|
|
338
|
-
```
|
|
339
|
-
|
|
340
|
-
Session loader now builds:
|
|
341
|
-
```
|
|
342
|
-
[summary_as_user_msg], u4, a4, t4, a4, u5, a5
|
|
343
|
-
```
|
|
344
|
-
|
|
345
|
-
#### Example: Multiple Compactions
|
|
346
|
-
|
|
347
|
-
After more messages, second compaction triggers:
|
|
348
|
-
```
|
|
349
|
-
u1, a1, t1, t1, a1, u2, a2, u3, a3, t3, a3, t3, a3, u4, a4, t4, a4
|
|
350
|
-
[COMPACTION 1: summary="...", keepLastMessages=4]
|
|
351
|
-
u5, a5, u6, a6, t6, a6, u7, a7
|
|
352
|
-
[COMPACTION 2: summary="...", keepLastMessages=3]
|
|
353
|
-
```
|
|
354
|
-
|
|
355
|
-
Session loader finds COMPACTION 2 (latest), builds:
|
|
356
|
-
```
|
|
357
|
-
[summary2_as_user_msg], u6, a6, t6, a6, u7, a7
|
|
358
|
-
```
|
|
359
|
-
|
|
360
|
-
Note: COMPACTION 2's summary incorporates COMPACTION 1's summary because the summarization model received the full current context (which included summary1 as first message).
|
|
361
|
-
|
|
362
|
-
**Boundary rule:** When calculating `keepLastMessages` for COMPACTION 2, we only count messages between COMPACTION 1 and COMPACTION 2. If `keepLastMessages` exceeds the available messages (e.g., keepLastMessages=10 but only 6 messages exist after COMPACTION 1), we take all available messages up to the boundary. We never cross a compaction boundary.
|
|
363
|
-
|
|
364
|
-
### Summarization
|
|
365
|
-
|
|
366
|
-
Use **pi-ai directly** (not the full agent loop) for summarization:
|
|
367
|
-
- No tools needed
|
|
368
|
-
- Set `maxTokens` to `0.8 * reserveTokens` (leaves 20% for prompt overhead and safety margin)
|
|
369
|
-
- Pass abort signal for cancellation
|
|
370
|
-
- Use the currently selected model
|
|
371
|
-
- **Reasoning disabled** (thinking level "off") since we just need a summary, not extended reasoning
|
|
372
|
-
|
|
373
|
-
With default `reserveTokens: 16384`, maxTokens = ~13107.
|
|
374
|
-
|
|
375
|
-
**Prompt** (based on Codex, enhanced):
|
|
376
|
-
```markdown
|
|
377
|
-
You are performing a CONTEXT CHECKPOINT COMPACTION. Create a handoff summary for another LLM that will resume the task.
|
|
378
|
-
|
|
379
|
-
Include:
|
|
380
|
-
- Current progress and key decisions made
|
|
381
|
-
- Important context, constraints, or user preferences
|
|
382
|
-
- Absolute file paths of any relevant files that were read or modified
|
|
383
|
-
- What remains to be done (clear next steps)
|
|
384
|
-
- Any critical data, examples, or references needed to continue
|
|
385
|
-
|
|
386
|
-
Be concise, structured, and focused on helping the next LLM seamlessly continue the work.
|
|
387
|
-
```
|
|
388
|
-
|
|
389
|
-
### Error Handling
|
|
390
|
-
|
|
391
|
-
- On compaction failure: output error, let user decide what to do
|
|
392
|
-
- In JSON/RPC mode: emit `{"type": "error", "error": "message"}` (existing pattern)
|
|
393
|
-
- Compaction is abortable via the same abort signal as regular streaming
|
|
394
|
-
|
|
395
|
-
### Image Handling
|
|
396
|
-
|
|
397
|
-
Two cases:
|
|
398
|
-
1. **Images via file path in prompt** → Model reads with tool → Can be captured in summary as "image at /path/to/file.png was analyzed". Prompt instructs model to include absolute file paths.
|
|
399
|
-
2. **Images via @attachment** → Attached to user message directly → Lost in compaction (can't summarize an image). Known limitation.
|
|
400
|
-
|
|
401
|
-
### Modes
|
|
402
|
-
|
|
403
|
-
Works in all modes:
|
|
404
|
-
- **TUI**: Commands available, UI shows compaction happening
|
|
405
|
-
- **Print/JSON**: Compaction events emitted as output
|
|
406
|
-
- **RPC**: Compaction events sent to client
|
|
407
|
-
|
|
408
|
-
### Interaction with /branch
|
|
409
|
-
|
|
410
|
-
The `/branch` command lets users create a new session from a previous user message. With compaction:
|
|
411
|
-
|
|
412
|
-
- **Branch UI reads from session file directly** (not from `state.messages`) to show ALL user messages, including those before compaction events
|
|
413
|
-
- **Branching copies the raw session file** line-by-line up to (but excluding) the selected user message, preserving all compaction events and intermediate entries
|
|
414
|
-
|
|
415
|
-
#### Why read from session file instead of state.messages
|
|
416
|
-
|
|
417
|
-
After compaction, `state.messages` only contains `[summary_user_msg, ...kept_messages, ...new_messages]`. The pre-compaction messages are not in state. To allow branching to any historical point, we must read the session file directly.
|
|
418
|
-
|
|
419
|
-
#### Reworked createBranchedSession
|
|
420
|
-
|
|
421
|
-
Current implementation iterates `state.messages` and writes fresh entries. New implementation:
|
|
422
|
-
1. Read session file line by line
|
|
423
|
-
2. For each line, check if it's the target user message
|
|
424
|
-
3. Copy all lines up to (but excluding) the target user message
|
|
425
|
-
4. The target user message text goes into the editor
|
|
426
|
-
|
|
427
|
-
#### Example: Branching After Compaction
|
|
428
|
-
|
|
429
|
-
Session file:
|
|
430
|
-
```
|
|
431
|
-
u1, a1, u2, a2
|
|
432
|
-
[COMPACTION: summary="...", keepLastMessages=2]
|
|
433
|
-
u3, a3, u4, a4
|
|
434
|
-
```
|
|
435
|
-
|
|
436
|
-
User branches at u3. New session file:
|
|
437
|
-
```
|
|
438
|
-
u1, a1, u2, a2
|
|
439
|
-
[COMPACTION: summary="...", keepLastMessages=2]
|
|
440
|
-
```
|
|
441
|
-
|
|
442
|
-
Session loader builds context for new session:
|
|
443
|
-
```
|
|
444
|
-
[summary_as_user_msg], u2, a2
|
|
445
|
-
```
|
|
446
|
-
|
|
447
|
-
User's editor contains u3's text for editing/resubmission.
|
|
448
|
-
|
|
449
|
-
#### Example: Branching Before Compaction
|
|
450
|
-
|
|
451
|
-
Same session file, user branches at u2. New session file:
|
|
452
|
-
```
|
|
453
|
-
u1, a1
|
|
454
|
-
```
|
|
455
|
-
|
|
456
|
-
No compaction in new session. Session loader builds:
|
|
457
|
-
```
|
|
458
|
-
u1, a1
|
|
459
|
-
```
|
|
460
|
-
|
|
461
|
-
This effectively "undoes" the compaction, letting users recover if important context was lost.
|
|
462
|
-
|
|
463
|
-
### Auto-Compaction Trigger
|
|
464
|
-
|
|
465
|
-
Auto-compaction is checked in the agent subscription callback after each `message_end` event for assistant messages. If context tokens exceed the threshold, compaction runs.
|
|
466
|
-
|
|
467
|
-
**Why abort mid-turn:** If auto-compaction triggers after an assistant message that contains tool calls, we abort immediately rather than waiting for tool results. Waiting would risk:
|
|
468
|
-
1. Tool results filling remaining context, leaving no room for the summary
|
|
469
|
-
2. Context overflow before the next check point (agent_end)
|
|
470
|
-
|
|
471
|
-
The abort causes some work loss, but the summary captures progress up to that point.
|
|
472
|
-
|
|
473
|
-
**Trigger flow (similar to `/clear` command):**
|
|
474
|
-
|
|
475
|
-
```typescript
|
|
476
|
-
async handleAutoCompaction(): Promise<void> {
|
|
477
|
-
// 1. Unsubscribe to stop processing events (no more messages added to state/session)
|
|
478
|
-
this.unsubscribe?.();
|
|
479
|
-
|
|
480
|
-
// 2. Abort current agent run and wait for completion
|
|
481
|
-
this.agent.abort();
|
|
482
|
-
await this.agent.waitForIdle();
|
|
483
|
-
|
|
484
|
-
// 3. Stop loading animation
|
|
485
|
-
if (this.loadingAnimation) {
|
|
486
|
-
this.loadingAnimation.stop();
|
|
487
|
-
this.loadingAnimation = null;
|
|
488
|
-
}
|
|
489
|
-
this.statusContainer.clear();
|
|
490
|
-
|
|
491
|
-
// 4. Perform compaction on current state:
|
|
492
|
-
// - Generate summary using pi-ai directly (no tools, reasoning off)
|
|
493
|
-
// - Write compaction event to session file
|
|
494
|
-
// - Rebuild agent messages (summary as user msg + kept messages)
|
|
495
|
-
// - Rebuild UI to reflect new state
|
|
496
|
-
|
|
497
|
-
// 5. Resubscribe to agent
|
|
498
|
-
this.subscribeToAgent();
|
|
499
|
-
|
|
500
|
-
// 6. Show compaction notification to user
|
|
501
|
-
}
|
|
502
|
-
```
|
|
503
|
-
|
|
504
|
-
This mirrors the `/clear` command pattern: unsubscribe first to prevent processing abort events, then abort and wait, then do the work, then resubscribe.
|
|
505
|
-
|
|
506
|
-
### Implementation Steps
|
|
507
|
-
|
|
508
|
-
1. Add `compaction` field to `Settings` interface and `SettingsManager`
|
|
509
|
-
2. Add `CompactionEvent` type to session manager
|
|
510
|
-
3. Update session loader to handle compaction events (find latest, apply keepLastMessages with boundary rule)
|
|
511
|
-
4. Rework `createBranchedSession` to copy raw session file lines instead of re-serializing from state
|
|
512
|
-
5. Update `/branch` UI to read user messages from session file directly
|
|
513
|
-
6. Add `/compact` command handler
|
|
514
|
-
7. Add `/autocompact` command with selector UI
|
|
515
|
-
8. Add auto-compaction check in subscription callback after assistant `message_end`
|
|
516
|
-
9. Implement `handleAutoCompaction()` following the unsubscribe/abort/wait/compact/resubscribe pattern
|
|
517
|
-
10. Implement summarization function using pi-ai (no tools, reasoning off)
|
|
518
|
-
11. Add compaction event to RPC/JSON output types
|
|
519
|
-
12. Update footer to show when auto-compact is disabled
|