@martian-engineering/lossless-claw 0.9.4 → 0.11.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -32,7 +32,7 @@ Summaries are lossy by design. The "Expand for details about:" footer at the end
32
32
 
33
33
  Search across messages and/or summaries using regex or full-text search.
34
34
 
35
- Use `mode: "full_text"` for keyword or topical recall. Wrap exact multi-word phrases in quotes to preserve phrase matching. Keep the default `sort: "recency"` for recent events, switch to `sort: "relevance"` when looking for the best older match on a topic, and use `sort: "hybrid"` when you want relevance without giving up recency entirely.
35
+ Use `mode: "full_text"` for keyword or topical recall. Full-text queries are not regexes: alternation (`A|B`), regex wildcards (`.*`), character classes (`[abc]`), and anchors (`^foo`, `foo$`) require `mode: "regex"`. Wrap exact multi-word phrases in quotes to preserve phrase matching. Keep the default `sort: "recency"` for recent events, switch to `sort: "relevance"` when looking for the best older match on a topic, and use `sort: "hybrid"` when you want relevance without giving up recency entirely.
36
36
 
37
37
  **Parameters:**
38
38
 
@@ -41,7 +41,7 @@ Use `mode: "full_text"` for keyword or topical recall. Wrap exact multi-word phr
41
41
  | `pattern` | string | ✅ | — | Search pattern |
42
42
  | `mode` | string | | `"regex"` | `"regex"` or `"full_text"` |
43
43
  | `scope` | string | | `"both"` | `"messages"`, `"summaries"`, or `"both"` |
44
- | `conversationId` | number | | current | Specific conversation to search |
44
+ | `conversationId` | number | | current session family | Specific physical conversation to search |
45
45
  | `allConversations` | boolean | | `false` | Search all conversations |
46
46
  | `since` | string | | — | ISO timestamp lower bound |
47
47
  | `before` | string | | — | ISO timestamp upper bound |
@@ -81,7 +81,7 @@ Look up metadata and content for a specific summary or stored file.
81
81
  | Param | Type | Required | Default | Description |
82
82
  |-------|------|----------|---------|-------------|
83
83
  | `id` | string | ✅ | — | `sum_xxx` for summaries, `file_xxx` for files |
84
- | `conversationId` | number | | current | Scope to a specific conversation |
84
+ | `conversationId` | number | | current session family | Scope to a specific physical conversation |
85
85
  | `allConversations` | boolean | | `false` | Allow cross-conversation lookups |
86
86
 
87
87
  **Returns for summaries:**
@@ -124,7 +124,7 @@ When `allConversations: true` is set, `lcm_expand_query` can now synthesize one
124
124
  | `query` | string | ✅* | — | Text query to find summaries (if no `summaryIds`) |
125
125
  | `summaryIds` | string[] | ✅* | — | Specific summary IDs to expand (if no `query`) |
126
126
  | `maxTokens` | number | | 2000 | Answer length cap |
127
- | `conversationId` | number | | current | Scope to a specific conversation |
127
+ | `conversationId` | number | | current session family | Scope to a specific physical conversation |
128
128
  | `allConversations` | boolean | | `false` | Search across all conversations |
129
129
 
130
130
  *One of `query` or `summaryIds` is required.
@@ -177,7 +177,7 @@ Add instructions to your agent's system prompt so it knows when to use LCM tools
177
177
  ## Memory & Context
178
178
 
179
179
  Use LCM tools for recall:
180
- 1. `lcm_grep` — Search all conversations by keyword/regex. Prefer `mode: "full_text"` for topic recall, quote exact phrases, use `sort: "relevance"` for older-topic lookups, and `sort: "hybrid"` when recency should still matter.
180
+ 1. `lcm_grep` — Search all conversations by keyword/regex. Prefer `mode: "full_text"` for short topic terms, use `mode: "regex"` for alternation or other regex syntax, quote exact phrases, use `sort: "relevance"` for older-topic lookups, and `sort: "hybrid"` when recency should still matter.
181
181
  2. `lcm_describe` — Inspect a specific summary (cheap, no sub-agent)
182
182
  3. `lcm_expand_query` — Deep recall with bounded sub-agent expansion
183
183
 
@@ -187,7 +187,7 @@ listing something you need, use `lcm_expand_query` to get the full detail.
187
187
 
188
188
  ### Conversation scoping
189
189
 
190
- By default, tools operate on the current conversation. Use `lcm_grep(..., allConversations: true)` when you need broad global discovery. Use `lcm_expand_query(..., allConversations: true)` when you want bounded synthesis across sessions. Use `conversationId` when you already know the exact conversation to inspect or expand.
190
+ By default, tools operate on the current session family: the active conversation plus archived segments that share the same stable session identity. This keeps recall continuous across session rotation and `/reset` replacement rows without widening the search to unrelated sessions. Use `lcm_grep(..., allConversations: true)` when you need broad global discovery. Use `lcm_expand_query(..., allConversations: true)` when you want bounded synthesis across sessions. Use `conversationId` when you already know the exact physical conversation to inspect or expand.
191
191
 
192
192
  ### Performance considerations
193
193
 
@@ -56,7 +56,7 @@ When OpenClaw processes a turn, it calls the context engine's lifecycle hooks:
56
56
 
57
57
  1. **bootstrap** — On session start, reconciles the JSONL session file with the LCM database. Imports any messages that exist in the file but not in LCM (crash recovery).
58
58
  2. **ingest** / **ingestBatch** — Persists new messages to the database and appends them to context_items.
59
- 3. **afterTurn** — After the model responds, ingests new messages, then evaluates whether compaction should run.
59
+ 3. **afterTurn** — After the model responds, ingests new messages, then evaluates whether `contextThreshold` requires compaction.
60
60
 
61
61
  ### Leaf compaction
62
62
 
@@ -66,9 +66,9 @@ The **leaf pass** converts raw messages into leaf summaries:
66
66
  2. Cap the chunk at `leafChunkTokens` (default 20k tokens).
67
67
  3. Concatenate message content with timestamps.
68
68
  4. Resolve the most recent prior summary for continuity (passed as `previous_context` so the LLM avoids repeating known information).
69
- 5. Send to the LLM with the leaf prompt.
70
- 6. Normalize provider response blocks (Anthropic/OpenAI text, output_text, and nested content/summary shapes) into plain text.
71
- 7. If normalization is empty, log provider/model/block-type diagnostics and fall back to deterministic truncation.
69
+ 5. Send to OpenClaw's host-owned `runtime.llm.complete` capability with the leaf prompt.
70
+ 6. Normalize runtime LLM response text into plain text while preserving provider/model diagnostics from the host result.
71
+ 7. If normalization is empty, log provider/model diagnostics and fall back to deterministic truncation.
72
72
  8. If the summary is larger than the input (LLM failure), retry with the aggressive prompt. If still too large, fall back to deterministic truncation.
73
73
  9. Persist the summary, link to source messages, and replace the message range in context_items.
74
74
 
@@ -84,15 +84,16 @@ The **condensed pass** merges summaries at the same depth into a higher-level su
84
84
 
85
85
  ### Compaction modes
86
86
 
87
- **Incremental (after each turn):**
88
- - Checks if raw tokens outside the fresh tail exceed `leafChunkTokens`
89
- - If so, runs one leaf pass
90
- - If `incrementalMaxDepth != 0`, follows with condensation passes up to that depth (`-1` for unlimited)
91
- - Best-effort: failures don't break the conversation
87
+ **Automatic threshold sweep (after each turn):**
88
+ - Checks if the assembled context crosses `contextThreshold`
89
+ - Below threshold, does not compact and does not record leaf debt
90
+ - In deferred mode, records one `"threshold"` maintenance row for background, `maintain()`, or pre-assembly execution
91
+ - In inline mode, runs a full sweep before `afterTurn()` completes
92
92
 
93
- **Full sweep (manual `/compact` or overflow):**
93
+ **Full sweep (threshold, manual `/compact`, or overflow):**
94
94
  - Phase 1: Repeatedly runs leaf passes until no more eligible chunks
95
- - Phase 2: Repeatedly runs condensation passes starting from the shallowest eligible depth
95
+ - Phase 2: If the summarized prefix is above `summaryPrefixTargetTokens`, repeatedly runs condensation passes starting from the shallowest eligible depth, respecting the preferred `sweepMaxDepth` (`0` for leaf-only, `-1` for unlimited)
96
+ - Pressure phase: If summarized-prefix pressure remains, condensation may go beyond `sweepMaxDepth` using the hard fanout floor
96
97
  - Each pass checks for progress; stops if no tokens were saved
97
98
 
98
99
  **Budget-targeted (`compactUntilUnder`):**
@@ -206,6 +207,8 @@ LCM handles crash recovery through **bootstrap reconciliation**:
206
207
  2. Compare against the LCM database.
207
208
  3. Find the most recent message that exists in both (the "anchor").
208
209
  4. Import any messages after the anchor that are in JSONL but not in LCM.
210
+ 5. If an existing session key moves to a different transcript file and no anchor exists, treat the new file as a bounded transcript epoch and import its recoverable messages. The same flood cap used for tail reconciliation prevents large unrelated transcripts from being appended automatically.
211
+ 6. Advance the bootstrap checkpoint only after an overlap is found or a bounded epoch import succeeds. No-anchor reads that import nothing leave the old checkpoint in place so a later turn can retry.
209
212
 
210
213
  This handles the case where OpenClaw wrote messages to the session file but crashed before LCM could persist them.
211
214
 
@@ -213,12 +216,8 @@ This handles the case where OpenClaw wrote messages to the session file but cras
213
216
 
214
217
  All mutating operations (ingest, compact) are serialized per-session using a promise queue. This prevents races between concurrent afterTurn/compact calls for the same conversation without blocking operations on different conversations.
215
218
 
216
- ## Authentication
219
+ ## Runtime LLM boundary
217
220
 
218
- LCM needs to call an LLM for summarization. It resolves credentials through a three-tier cascade:
221
+ LCM needs model inference for summarization, but it does not resolve provider credentials, base URLs, or provider transport settings directly. Summarization calls go through OpenClaw's `runtime.llm.complete` capability, which owns model preparation, credential resolution, OAuth refresh, provider dispatch, and usage attribution.
219
222
 
220
- 1. **Auth profiles** OpenClaw's OAuth/token/API-key profile system (`auth-profiles.json`), checked in priority order
221
- 2. **Environment variables** — Standard provider env vars (`ANTHROPIC_API_KEY`, etc.)
222
- 3. **Custom provider key** — From models config (e.g., `models.json`)
223
-
224
- For OAuth providers (e.g., Anthropic via Claude Max), LCM handles token refresh and credential persistence automatically.
223
+ Configured Lossless summary model overrides (`summaryModel`, `largeFileSummaryModel`, and `fallbackProviders`) are sent as runtime LLM model override requests. OpenClaw enforces those requests with `plugins.entries.lossless-claw.llm.allowModelOverride` and `plugins.entries.lossless-claw.llm.allowedModels`; denied overrides fail closed instead of silently falling back to a different model.
@@ -0,0 +1,243 @@
1
+ # Compaction Redesign Map
2
+
3
+ Status: implementation pass
4
+ Date: 2026-05-14
5
+ Branch: `josh/compaction-redesign`
6
+
7
+ ## Goal
8
+
9
+ Lossless should stop trying to infer whether a provider prompt cache is hot or cold before deciding whether to compact. The old cache-aware incremental strategy could not be made sound with the signals available to Lossless: by the time a low cache-read observation arrives, the provider has usually already rewritten the cache for that turn. Without a reliable `expiresAt` signal, Lossless cannot safely tell "cold before this turn" from "cold during this turn, now hot again."
10
+
11
+ The new design is intentionally simpler:
12
+
13
+ 1. Do not run automatic incremental compaction from raw-history pressure.
14
+ 2. Let context grow in the assembled transcript until the configured threshold is crossed.
15
+ 3. When `contextThreshold` is crossed, run the existing full-sweep mechanism.
16
+ 4. Keep the fresh tail as the protected boundary for recent verbatim context.
17
+ 5. Reuse existing summary sizing and fanout configuration.
18
+ 6. Use a summarized-prefix pressure target only as the escape hatch when a preferred-depth sweep does not reduce enough context.
19
+
20
+ ## Implemented Decisions
21
+
22
+ | Decision | Outcome |
23
+ | --- | --- |
24
+ | Automatic compaction trigger | `contextThreshold` only. |
25
+ | Raw leaf trigger | Kept as a diagnostic/manual helper; removed from automatic scheduling. |
26
+ | Deferred debt | New automatic debt uses only reason `"threshold"`. |
27
+ | Cache hotness | No longer delays automatic threshold compaction. |
28
+ | Legacy non-threshold debt | Revalidated against threshold, then swept or marked finished as obsolete. |
29
+ | Full sweep trigger | No longer starts only because `evaluateLeafTrigger()` is true. |
30
+ | Full sweep preferred depth | `compactFullSweep()` now respects `sweepMaxDepth` during routine condensation. |
31
+ | Fresh tail | Kept. It remains independent of incremental compaction. |
32
+ | Default leaf chunk size | Kept at 20k tokens. |
33
+ | Deprecated depth key | `incrementalMaxDepth` remains accepted as an alias for `sweepMaxDepth`. |
34
+ | Pressure escape hatch | `summaryPrefixTargetTokens` lets sweeps condense beyond the preferred depth when summarized context remains too large. |
35
+ | `cacheAwareCompaction.*` | Still visible and accepted, but documented as deprecated compatibility config. |
36
+ | `dynamicLeafChunkTokens.*` | Still visible and accepted, but documented as deprecated compatibility config. |
37
+ | Engine `compactLeafAsync()` | Removed. Automatic and public engine compaction should go through threshold/full-sweep paths. |
38
+ | `CompactionEngine.compactLeaf()` | Kept as a lower-level helper and for focused tests. |
39
+ | Stable hot-cache orphan stripping | Removed with the cache-state-dependent assembly behavior. |
40
+
41
+ ## Current Lifecycle
42
+
43
+ ### Ingestion and After-Turn Scheduling
44
+
45
+ `LcmContextEngine.afterTurn()` now follows one automatic policy:
46
+
47
+ ```text
48
+ afterTurn -> ingest messages -> update telemetry -> evaluate contextThreshold
49
+
50
+ if below threshold:
51
+ do not compact
52
+ do not record maintenance debt
53
+
54
+ if threshold is crossed and mode is inline:
55
+ run threshold full sweep inline
56
+
57
+ if threshold is crossed and mode is deferred:
58
+ record one threshold maintenance row
59
+ schedule the background drain
60
+ ```
61
+
62
+ The raw-history leaf trigger is no longer part of this lifecycle. `evaluateLeafTrigger()` can still answer "is there enough old raw material for a leaf pass?", but that answer does not cause automatic maintenance.
63
+
64
+ Relevant code:
65
+
66
+ - `src/engine.ts`: `afterTurn()`
67
+ - `src/engine.ts`: `recordDeferredCompactionDebt()`
68
+ - `src/store/compaction-maintenance-store.ts`: one coalesced maintenance row per conversation
69
+ - `src/compaction.ts`: `evaluateLeafTrigger()`
70
+
71
+ ### Deferred Debt
72
+
73
+ Deferred maintenance still exists because threshold sweeps can be expensive and should often happen outside the critical response path.
74
+
75
+ New automatic debt should always use:
76
+
77
+ ```text
78
+ reason = "threshold"
79
+ ```
80
+
81
+ When the debt drains, Lossless calls threshold full sweep via `executeCompactionCore({ compactionTarget: "threshold" })`. Prompt-cache telemetry and TTLs are not consulted. Session queue idleness remains relevant because compaction should not race active session work.
82
+
83
+ Old databases may contain pending non-threshold debt from previous builds. The compatibility behavior is:
84
+
85
+ - re-evaluate `contextThreshold` at consumption time
86
+ - if the conversation is over threshold, run threshold full sweep
87
+ - if it is under threshold, mark the old debt finished with a no-op legacy reason
88
+
89
+ This clears obsolete maintenance rows without deleting persisted conversation data.
90
+
91
+ Relevant code:
92
+
93
+ - `src/engine.ts`: `drainDeferredCompactionDebtNow()`
94
+ - `src/engine.ts`: `consumeDeferredCompactionDebt()`
95
+ - `src/engine.ts`: `maintain()`
96
+ - `src/engine.ts`: pre-assembly maintenance drain
97
+
98
+ ### Full Sweep
99
+
100
+ `CompactionEngine.compact()` delegates to `compactFullSweep()`.
101
+
102
+ The sweep has two phases:
103
+
104
+ 1. Leaf phase: repeatedly summarize the oldest raw chunks outside the fresh tail.
105
+ 2. Condensed phase: if summarized-prefix tokens exceed `summaryPrefixTargetTokens`, repeatedly summarize same-depth summary chunks, shallowest first.
106
+
107
+ Routine threshold sweeps use `contextThreshold` to decide when to start compaction. Once started, the leaf phase runs until no eligible raw-message chunk remains outside the fresh tail. Condensation is controlled by `summaryPrefixTargetTokens`, not by total context pressure. Forced sweeps still stop when no eligible chunk remains or when a pass stops making token progress.
108
+
109
+ `sweepMaxDepth` is the preferred source-depth cap for routine full-sweep condensation:
110
+
111
+ - `0`: leaf summaries only
112
+ - `1`: depth-0 summaries may condense into depth 1, then stop
113
+ - `2`: depth 0 -> 1 and depth 1 -> 2 are allowed
114
+ - `-1`: unlimited
115
+
116
+ The cap is intentionally aspirational. If summary tokens outside the fresh tail exceed `summaryPrefixTargetTokens` after routine condensation, Lossless runs a pressure condensation phase that may go deeper using `condensedMinFanoutHard`.
117
+
118
+ Relevant code:
119
+
120
+ - `src/compaction.ts`: `compactFullSweep()`
121
+ - `src/compaction.ts`: `selectOldestLeafChunk()`
122
+ - `src/compaction.ts`: `selectShallowestCondensationCandidate()`
123
+ - `src/compaction.ts`: `resolveSweepMaxDepth()`
124
+ - `src/compaction.ts`: `resolveSummaryPrefixTargetTokens()`
125
+
126
+ ### Fresh Tail
127
+
128
+ The fresh tail is not incremental compaction. It stays because it protects recent verbatim context and gives both assembly and compaction a stable boundary.
129
+
130
+ The fresh tail:
131
+
132
+ - is always included during assembly
133
+ - is excluded from leaf summarization
134
+ - may be capped by `freshTailMaxTokens`
135
+ - still preserves the newest message even when that one message exceeds the cap
136
+
137
+ Relevant code:
138
+
139
+ - `src/assembler.ts`: `resolveFreshTailOrdinal()`
140
+ - `src/assembler.ts`: `Assembler.assemble()`
141
+ - `src/compaction.ts`: `resolveFreshTailOrdinal()`
142
+ - `src/compaction.ts`: `countRawTokensOutsideFreshTail()`
143
+
144
+ ## Removed Automatic Policy
145
+
146
+ The old `evaluateIncrementalCompaction()` path combined:
147
+
148
+ - prompt-cache telemetry
149
+ - hot/cold/unknown cache-state heuristics
150
+ - cache TTL guesses
151
+ - dynamic leaf chunk sizing
152
+ - raw-history pressure outside the fresh tail
153
+ - bounded cold-cache catch-up
154
+ - hot-cache leaf-only behavior
155
+ - budget-headroom gates
156
+
157
+ That policy is removed from automatic scheduling. The important reason is not that each individual heuristic was unreasonable; it is that the combined decision depended on cache state that Lossless cannot reliably observe at the time it must decide whether to mutate the prompt prefix.
158
+
159
+ ## Config Semantics
160
+
161
+ ### Active Settings
162
+
163
+ | Key | Role |
164
+ | --- | --- |
165
+ | `contextThreshold` | The only automatic compaction trigger. |
166
+ | `proactiveThresholdCompactionMode` | Chooses inline vs deferred threshold full sweep. |
167
+ | `freshTailCount` | Protects newest raw messages during assembly and compaction. |
168
+ | `freshTailMaxTokens` | Optional cap for protected fresh-tail size. |
169
+ | `leafChunkTokens` | Maximum raw material per leaf summary during sweep; default remains 20k. |
170
+ | `leafMinFanout` | Minimum raw-message or depth-0 summary fanout for useful compaction. |
171
+ | `condensedMinFanout` | Normal same-depth condensation grouping for depth 1+. |
172
+ | `condensedMinFanoutHard` | Hard-trigger/repair condensation grouping. |
173
+ | `sweepMaxDepth` | Preferred source-depth cap for routine threshold full sweep. |
174
+ | `summaryPrefixTargetTokens` | Optional target for summarized-prefix tokens; pressure condensation may go deeper if this target is missed. |
175
+ | `leafTargetTokens` | Leaf summary target. |
176
+ | `condensedTargetTokens` | Condensed summary target. |
177
+
178
+ ### Deprecated Compatibility Settings
179
+
180
+ | Key | Status |
181
+ | --- | --- |
182
+ | `incrementalMaxDepth` | Accepted as a deprecated alias for `sweepMaxDepth`. New config should use `sweepMaxDepth`. |
183
+ | `cacheAwareCompaction.*` | Accepted and visible as deprecated config. It no longer changes automatic compaction decisions. |
184
+ | `dynamicLeafChunkTokens.*` | Accepted and visible as deprecated config. Automatic compaction uses `leafChunkTokens` directly. |
185
+
186
+ Keeping these settings visible avoids breaking existing OpenClaw config and gives operators an explicit deprecation signal instead of silently hiding known keys.
187
+
188
+ ## Stable Orphan Stripping Tradeoff
189
+
190
+ The old cache-aware assembly path could preserve a stable hot-cache boundary by overriding tool-call orphan stripping at a previously observed ordinal. This was removed with the rest of the cache-state-dependent assembly behavior.
191
+
192
+ Benefits of removal:
193
+
194
+ - the assembled prompt no longer changes based on inferred cache hotness
195
+ - assembly has fewer hidden stateful branches
196
+ - prompt-prefix behavior is easier to reason about and test
197
+ - cache telemetry remains diagnostic instead of controlling prompt mutation
198
+
199
+ Cost of removal:
200
+
201
+ - Lossless gives up one cache-oriented prefix-stability optimization for tool-call boundaries
202
+ - in some hot-cache sessions, ordinary tool-pair repair may alter the prefix sooner than the old stable-boundary override would have
203
+
204
+ The ordinary assembler still sanitizes tool-use/tool-result pairing, so this is a cache-efficiency tradeoff rather than a transcript-correctness tradeoff.
205
+
206
+ ## Test Coverage
207
+
208
+ The implementation should cover:
209
+
210
+ - below-threshold turns do not compact and do not record debt
211
+ - threshold crossings record only `"threshold"` debt in deferred mode
212
+ - inline mode runs threshold full sweep rather than leaf-trigger compaction
213
+ - background drain consumes threshold debt without prompt-cache telemetry or TTL
214
+ - `maintain()` consumes threshold debt without prompt-cache delay
215
+ - pre-assembly drain consumes threshold debt without prompt-cache delay
216
+ - legacy non-threshold debt is cleared when threshold no longer applies
217
+ - legacy non-threshold debt is upgraded to threshold full sweep when threshold still applies
218
+ - `compactFullSweep()` treats `sweepMaxDepth` as a preferred depth
219
+ - `compactFullSweep()` pressure-condenses past `sweepMaxDepth` when threshold or summary-prefix pressure remains
220
+ - the fresh tail remains verbatim and un-compacted
221
+
222
+ Removed or rewritten coverage:
223
+
224
+ - hot-cache delay gate tests
225
+ - cold-cache catch-up tests
226
+ - dynamic automatic leaf chunk tests
227
+ - automatic leaf debt tests
228
+ - engine-level `compactLeafAsync()` tests
229
+ - stable hot-cache orphan-stripping tests
230
+
231
+ ## Non-Goals
232
+
233
+ - Do not add a total-context target floor in this pass.
234
+ - Do not remove persisted telemetry or maintenance tables.
235
+ - Do not parallelize full-sweep leaf summaries yet. The current leaf prompt uses prior summary continuity, so parallelization would require a separate semantic design.
236
+ - Do not depend on provider cache `expiresAt`.
237
+ - Do not remove accepted deprecated config keys until a separate migration decision is made.
238
+
239
+ ## Follow-Up Watch Items
240
+
241
+ 1. If repeated threshold re-entry happens in live use, tune `summaryPrefixTargetTokens`, `contextThreshold`, `leafChunkTokens`, and fanout before adding a total-context target floor.
242
+ 2. If 20k leaf chunks make threshold sweeps too frequent, consider 30k before adding new mechanisms.
243
+ 3. If stable orphan stripping removal causes measurable cache regressions in tool-heavy sessions, revisit it as an assembly feature independent of cache-hotness inference.
@@ -24,12 +24,15 @@ Most installations only need to override a handful of keys. If you want a comple
24
24
  "freshTailCount": 64,
25
25
  "freshTailMaxTokens": 24000,
26
26
  "promptAwareEviction": false,
27
+ "stubLargeToolPayloads": false,
27
28
  "newSessionRetainDepth": 2,
28
29
  "leafMinFanout": 8,
29
30
  "condensedMinFanout": 4,
30
31
  "condensedMinFanoutHard": 2,
32
+ "sweepMaxDepth": 1,
31
33
  "incrementalMaxDepth": 1,
32
34
  "leafChunkTokens": 20000,
35
+ "summaryPrefixTargetTokens": 20000,
33
36
  "bootstrapMaxTokens": 6000,
34
37
  "leafTargetTokens": 2400,
35
38
  "condensedTargetTokens": 2000,
@@ -55,6 +58,7 @@ Most installations only need to override a handful of keys. If you want a comple
55
58
  "proactiveThresholdCompactionMode": "deferred",
56
59
  "autoRotateSessionFiles": {
57
60
  "enabled": true,
61
+ "createBackups": false,
58
62
  "sizeBytes": 2097152,
59
63
  "startup": "rotate",
60
64
  "runtime": "rotate"
@@ -66,7 +70,7 @@ Most installations only need to override a handful of keys. If you want a comple
66
70
  "hotCachePressureFactor": 4,
67
71
  "hotCacheBudgetHeadroomRatio": 0.2,
68
72
  "coldCacheObservationThreshold": 3,
69
- "criticalBudgetPressureRatio": 0.70
73
+ "criticalBudgetPressureRatio": 0.90
70
74
  },
71
75
  "dynamicLeafChunkTokens": {
72
76
  "enabled": true,
@@ -82,6 +86,7 @@ Notes on the example:
82
86
  - `largeFilesDir` shows the expanded default path shape. Both `databasePath` and `largeFilesDir` default to paths under `OPENCLAW_STATE_DIR` (which in turn falls back to `~/.openclaw`).
83
87
  - `timezone` has no fixed hardcoded default; at runtime it resolves from `TZ` first, then the system timezone. The example uses `America/Los_Angeles`.
84
88
  - `maxAssemblyTokenBudget` has no default. The example uses `30000` as a realistic cap for a 32k-class model.
89
+ - `summaryPrefixTargetTokens` has no fixed default. The example uses `20000`, which matches the derived default for large-context models with the default `leafChunkTokens`.
85
90
  - `databasePath` is the preferred key. `dbPath` is an accepted alias.
86
91
  - `largeFileThresholdTokens` is the preferred key. `largeFileTokenThreshold` is an accepted alias.
87
92
 
@@ -124,13 +129,14 @@ openclaw plugins install --link /path/to/lossless-claw
124
129
  | `transcriptGcEnabled` | `boolean` | `false` | `LCM_TRANSCRIPT_GC_ENABLED` | Enables transcript rewrite GC during `maintain()`; disabled by default so transcript rewrites stay opt-in. |
125
130
  | `proactiveThresholdCompactionMode` | `"deferred" \| "inline"` | `"deferred"` | `LCM_PROACTIVE_THRESHOLD_COMPACTION_MODE` | Controls whether proactive threshold compaction is deferred into maintenance debt by default or run inline for legacy behavior. |
126
131
  | `autoRotateSessionFiles.enabled` | `boolean` | `true` | `LCM_AUTO_ROTATE_SESSION_FILES_ENABLED` | Enables automatic rotation for oversized LCM-managed session JSONL files. |
132
+ | `autoRotateSessionFiles.createBackups` | `boolean` | `false` | `LCM_AUTO_ROTATE_SESSION_FILES_CREATE_BACKUPS` | Creates or replaces the rolling `rotate-latest` SQLite backup before automatic session-file rotation. Manual `/lcm rotate` backups are always created. |
127
133
  | `autoRotateSessionFiles.sizeBytes` | `integer` | `2097152` | `LCM_AUTO_ROTATE_SESSION_FILES_SIZE_BYTES` | Byte threshold that triggers automatic session-file rotation. |
128
134
  | `autoRotateSessionFiles.startup` | `"rotate" \| "warn" \| "off"` | `"rotate"` | `LCM_AUTO_ROTATE_SESSION_FILES_STARTUP` | Startup behavior for oversized indexed OpenClaw session transcripts that also have active LCM bootstrap state. |
129
135
  | `autoRotateSessionFiles.runtime` | `"rotate" \| "warn" \| "off"` | `"rotate"` | `LCM_AUTO_ROTATE_SESSION_FILES_RUNTIME` | Runtime behavior after `afterTurn()` and `maintain()` check the current transcript size. |
130
136
 
131
137
  > **Multi-profile note:** `OPENCLAW_STATE_DIR` (set by the host OpenClaw gateway) controls where state is stored. When two gateways run on the same host (e.g. separate bot personas), each gateway sets its own `OPENCLAW_STATE_DIR` and lossless-claw automatically uses that directory for the database, large-file payloads, auth-profile lookups, and legacy secrets — no per-profile plugin config is needed.
132
138
 
133
- Automatic session-file rotation uses the same safe path as `/lcm rotate`: runtime rotation replaces the rolling `rotate-latest` SQLite backup, rewrites only the live session transcript, keeps the active LCM conversation and durable history intact, and refreshes the bootstrap checkpoint. Startup rotation first scans OpenClaw's current indexed session stores for configured agents, then intersects those candidates with active LCM conversations and matching bootstrap file mappings. If multiple startup candidates need rotation, one pre-rotation LCM database backup is created for the batch before any transcript is rewritten. Rotation never runs for ignored sessions, stateless sessions, or sessions without active LCM state. The preserved JSONL tail follows the existing rotate behavior, which is controlled by `freshTailCount`.
139
+ Automatic session-file rotation rewrites only the live session transcript, keeps the active LCM conversation and durable history intact, and refreshes the bootstrap checkpoint. Startup rotation first scans OpenClaw's current indexed session stores for configured agents, then intersects those candidates with active LCM conversations and matching bootstrap file mappings. Automatic rotation does not create a SQLite backup by default; set `autoRotateSessionFiles.createBackups` to `true` to make runtime rotation replace the rolling `rotate-latest` backup and to make startup rotation create one pre-rotation LCM database backup for the batch before any transcript is rewritten. Manual `/lcm rotate` always keeps its backup-backed behavior regardless of this flag. Rotation never runs for ignored sessions, stateless sessions, or sessions without active LCM state. The preserved JSONL tail follows the existing rotate behavior, which is controlled by `freshTailCount`.
134
140
 
135
141
  Every automatic decision emits grep-able log lines prefixed with `[lcm] auto-rotate:`. Startup emits one compact summary line with `phase=startup`, `action=summary`, `scanned`, `eligible`, `rotated`, `warned`, `skipped`, `durationMs`, `bytesRemoved`, and backup fields when a batch backup was created; quiet skips such as missing files, missing bootstrap mappings, and below-threshold files are counted there instead of producing one line per candidate. Rotation detail lines include `phase`, `action`, `sessionId`, `sessionKey`, `sessionFile`, `sizeBytes`, `thresholdBytes`, `durationMs`, `backupPath`, `bytesRemoved`, `preservedTailMessageCount`, and `checkpointSize`; real warning lines include the same available context plus `reason` or `error`.
136
142
 
@@ -142,11 +148,14 @@ Every automatic decision emits grep-able log lines prefixed with `[lcm] auto-rot
142
148
  | `freshTailCount` | `integer` | `64` | `LCM_FRESH_TAIL_COUNT` | Number of newest messages always kept raw. |
143
149
  | `freshTailMaxTokens` | `integer` | unset | `LCM_FRESH_TAIL_MAX_TOKENS` | Optional token cap for the protected fresh tail. The newest message is always preserved even if it exceeds the cap. |
144
150
  | `promptAwareEviction` | `boolean` | `false` | `LCM_PROMPT_AWARE_EVICTION_ENABLED` | When enabled, budget-constrained assembly keeps older evictable items by prompt relevance instead of pure chronology. This improves retrieval under tight budgets, but it can reduce prompt-cache hit rates because the preserved prefix changes as prompts change. |
151
+ | `stubLargeToolPayloads` | `boolean` | `false` | `LCM_STUB_LARGE_TOOL_PAYLOADS` | When enabled, evictable tool-result rows backfilled with `messages.large_content` are assembled as `[LCM Tool Output: file_xxx ...]` stubs while the fresh tail stays inline. Requires `scripts/lcm-blob-migrate.mjs`, which defaults to the same large-files root as runtime LCM (`LCM_LARGE_FILES_DIR` or `${OPENCLAW_STATE_DIR}/lcm-files`). |
145
152
  | `leafMinFanout` | `integer` | `8` | `LCM_LEAF_MIN_FANOUT` | Minimum number of raw messages required before a leaf pass runs. |
146
153
  | `condensedMinFanout` | `integer` | `4` | `LCM_CONDENSED_MIN_FANOUT` | Number of same-depth summaries needed before condensation is attempted. |
147
154
  | `condensedMinFanoutHard` | `integer` | `2` | `LCM_CONDENSED_MIN_FANOUT_HARD` | Hard floor for condensation grouping during maintenance and repair flows. |
148
- | `incrementalMaxDepth` | `integer` | `1` | `LCM_INCREMENTAL_MAX_DEPTH` | Maximum automatic condensation depth after leaf compaction. Use `0` for leaf-only and `-1` for unlimited depth. |
149
- | `leafChunkTokens` | `integer` | `20000` | `LCM_LEAF_CHUNK_TOKENS` | Maximum source-token budget for a leaf compaction chunk. |
155
+ | `sweepMaxDepth` | `integer` | `1` | `LCM_SWEEP_MAX_DEPTH` | Preferred maximum condensation source depth during routine threshold sweeps. Use `0` for leaf-only and `-1` for unlimited depth. Pressure sweeps may go deeper when summarized context remains above target. |
156
+ | `incrementalMaxDepth` | `integer` | alias of `sweepMaxDepth` | `LCM_INCREMENTAL_MAX_DEPTH` | Deprecated alias for `sweepMaxDepth`. Kept so existing configs continue to load. |
157
+ | `leafChunkTokens` | `integer` | `20000` | `LCM_LEAF_CHUNK_TOKENS` | Maximum source-token budget for a leaf compaction chunk. Larger chunks reduce sweep frequency at the cost of slower individual summary calls. |
158
+ | `summaryPrefixTargetTokens` | `integer` | derived | `LCM_SUMMARY_PREFIX_TARGET_TOKENS` | Optional target for summarized-prefix tokens after a full sweep. If unset, Lossless derives `max(condensedTargetTokens, min(leafChunkTokens, floor(contextThreshold * tokenBudget * 0.5)))`. |
150
159
  | `bootstrapMaxTokens` | `integer` | `max(6000, floor(leafChunkTokens * 0.3))` | `LCM_BOOTSTRAP_MAX_TOKENS` | Maximum parent-history tokens imported when a new LCM conversation bootstraps. |
151
160
  | `leafTargetTokens` | `integer` | `2400` | `LCM_LEAF_TARGET_TOKENS` | Prompt target for leaf summary size. |
152
161
  | `condensedTargetTokens` | `integer` | `2000` | `LCM_CONDENSED_TARGET_TOKENS` | Prompt target for condensed summary size. |
@@ -170,6 +179,8 @@ Every automatic decision emits grep-able log lines prefixed with `[lcm] auto-rot
170
179
  | `summaryTimeoutMs` | `integer` | `60000` | `LCM_SUMMARY_TIMEOUT_MS` | Maximum time to wait for one model-backed summarizer call. |
171
180
  | `customInstructions` | `string` | `""` | `LCM_CUSTOM_INSTRUCTIONS` | Extra natural-language instructions injected into every summarization prompt. |
172
181
 
182
+ Summary calls are executed through OpenClaw's `api.runtime.llm.complete` capability. If you configure an explicit Lossless summary model (`summaryModel`, `largeFileSummaryModel`, or `fallbackProviders`), OpenClaw must allow that runtime LLM override under `plugins.entries.lossless-claw.llm.allowModelOverride` and `plugins.entries.lossless-claw.llm.allowedModels`. `openclaw doctor --fix` can add the minimal policy entries for configured Lossless summary models. Delegated expansion calls use OpenClaw's runtime sub-agent layer; explicit `expansionModel` values require `plugins.entries.lossless-claw.subagent.allowModelOverride` and a matching `subagent.allowedModels` entry, or `"*"` if you intentionally trust any expansion target. `openclaw doctor --fix` can add the minimal subagent policy, and `lcm_expand_query` retries once without the override if the host rejects it.
183
+
173
184
  ### Fallbacks, circuit breaking, and safety rails
174
185
 
175
186
  | Key | Type | Default | Env override | Purpose |
@@ -184,32 +195,33 @@ Every automatic decision emits grep-able log lines prefixed with `[lcm] auto-rot
184
195
 
185
196
  | Key | Type | Default | Env override | Purpose |
186
197
  | --- | --- | --- | --- | --- |
187
- | `cacheAwareCompaction.enabled` | `boolean` | `true` | `LCM_CACHE_AWARE_COMPACTION_ENABLED` | Defers incremental leaf compaction more aggressively when prompt-cache telemetry indicates a hot cache. |
188
- | `cacheAwareCompaction.cacheTTLSeconds` | `integer` | `300` | `LCM_CACHE_TTL_SECONDS` | Fallback cache TTL used when deferred Anthropic compaction has provider/model telemetry but no explicit runtime cache-retention window. |
189
- | `cacheAwareCompaction.maxColdCacheCatchupPasses` | `integer` | `2` | `LCM_MAX_COLD_CACHE_CATCHUP_PASSES` | Maximum bounded catch-up passes allowed in one maintenance cycle when cache telemetry is cold. |
190
- | `cacheAwareCompaction.hotCachePressureFactor` | `number` | `4` | `LCM_HOT_CACHE_PRESSURE_FACTOR` | Multiplier applied to the hot-cache leaf trigger before raw-history pressure overrides cache preservation. |
191
- | `cacheAwareCompaction.hotCacheBudgetHeadroomRatio` | `number` | `0.2` | `LCM_HOT_CACHE_BUDGET_HEADROOM_RATIO` | Minimum fraction of the real token budget that must remain free before hot-cache incremental compaction is skipped entirely. |
192
- | `cacheAwareCompaction.coldCacheObservationThreshold` | `integer` | `3` | `LCM_COLD_CACHE_OBSERVATION_THRESHOLD` | Consecutive cold observations required before non-explicit cache misses are treated as truly cold. This dampens one-off routing noise and provider failover blips. |
193
- | `cacheAwareCompaction.criticalBudgetPressureRatio` | `number` | `0.70` | `LCM_CRITICAL_BUDGET_PRESSURE_RATIO` | Fraction of the token budget at which deferred compaction bypasses hot-cache delay so prompt-mutating debt can run before overflow. Set to `1` to disable this bypass. |
198
+ | `cacheAwareCompaction.enabled` | `boolean` | `true` | `LCM_CACHE_AWARE_COMPACTION_ENABLED` | Deprecated. Accepted for config compatibility but no longer used for automatic compaction decisions. |
199
+ | `cacheAwareCompaction.cacheTTLSeconds` | `integer` | `300` | `LCM_CACHE_TTL_SECONDS` | Deprecated. Accepted for config compatibility; threshold debt no longer waits for cache TTL. |
200
+ | `cacheAwareCompaction.maxColdCacheCatchupPasses` | `integer` | `2` | `LCM_MAX_COLD_CACHE_CATCHUP_PASSES` | Deprecated. Automatic cold-cache catch-up passes were removed. |
201
+ | `cacheAwareCompaction.hotCachePressureFactor` | `number` | `4` | `LCM_HOT_CACHE_PRESSURE_FACTOR` | Deprecated. Hot-cache raw-history pressure no longer drives automatic compaction. |
202
+ | `cacheAwareCompaction.hotCacheBudgetHeadroomRatio` | `number` | `0.2` | `LCM_HOT_CACHE_BUDGET_HEADROOM_RATIO` | Deprecated. Hot-cache budget headroom no longer defers automatic threshold compaction. |
203
+ | `cacheAwareCompaction.coldCacheObservationThreshold` | `integer` | `3` | `LCM_COLD_CACHE_OBSERVATION_THRESHOLD` | Deprecated. Cold-cache streaks remain observable telemetry only. |
204
+ | `cacheAwareCompaction.criticalBudgetPressureRatio` | `number` | `0.90` | `LCM_CRITICAL_BUDGET_PRESSURE_RATIO` | Deprecated. `contextThreshold` is the only automatic compaction threshold. |
194
205
 
195
206
  #### `dynamicLeafChunkTokens`
196
207
 
197
208
  | Key | Type | Default | Env override | Purpose |
198
209
  | --- | --- | --- | --- | --- |
199
- | `dynamicLeafChunkTokens.enabled` | `boolean` | `true` | `LCM_DYNAMIC_LEAF_CHUNK_TOKENS_ENABLED` | Enables dynamic working leaf chunk sizes for busier sessions. |
200
- | `dynamicLeafChunkTokens.max` | `integer` | `max(leafChunkTokens, floor(leafChunkTokens * 2))` | `LCM_DYNAMIC_LEAF_CHUNK_TOKENS_MAX` | Upper bound for the dynamic working chunk size. With the default `leafChunkTokens=20000`, this resolves to `40000`. |
210
+ | `dynamicLeafChunkTokens.enabled` | `boolean` | `true` | `LCM_DYNAMIC_LEAF_CHUNK_TOKENS_ENABLED` | Deprecated. Accepted for config compatibility but no longer used by automatic compaction. |
211
+ | `dynamicLeafChunkTokens.max` | `integer` | `max(leafChunkTokens, floor(leafChunkTokens * 2))` | `LCM_DYNAMIC_LEAF_CHUNK_TOKENS_MAX` | Deprecated. With the default `leafChunkTokens=20000`, this resolves to `40000`, but automatic compaction uses `leafChunkTokens`. |
212
+
213
+ ### Threshold full-sweep compaction
201
214
 
202
- ### Cache-aware incremental compaction
215
+ Automatic compaction is threshold-only:
203
216
 
204
- When cache-aware compaction is enabled:
217
+ - `afterTurn()` evaluates `contextThreshold` against the active token budget
218
+ - below threshold, no automatic compaction runs and no leaf debt is recorded
219
+ - at or above threshold, inline mode runs a threshold full sweep immediately
220
+ - deferred mode records one coalesced `"threshold"` maintenance row and drains it in the background, `maintain()`, or pre-assembly
205
221
 
206
- - hot cache stretches the incremental leaf trigger to `dynamicLeafChunkTokens.max`
207
- - hot cache skips incremental maintenance entirely when the assembled context is still comfortably below the real token budget
208
- - hot cache also gets a short hysteresis window so one ambiguous turn does not immediately discard a recently healthy cache signal
209
- - cold cache still allows bounded catch-up passes via `cacheAwareCompaction.maxColdCacheCatchupPasses`
210
- - once `currentTokenCount >= criticalBudgetPressureRatio * tokenBudget`, deferred compaction bypasses hot-cache delay so prompt-mutating debt can run before emergency overflow handling
222
+ Lossless still records prompt-cache telemetry for status and diagnostics, but cache hotness no longer delays threshold debt. Legacy `cacheAwareCompaction.*` and `dynamicLeafChunkTokens.*` settings remain accepted so existing OpenClaw config continues to load, but they do not change automatic compaction behavior.
211
223
 
212
- When incremental leaf compaction still runs on a hot cache, follow-on condensed passes are suppressed so the maintenance cycle only pays for the leaf pass that was explicitly justified.
224
+ Full sweeps first run leaf passes until there are no more eligible raw-message chunks outside the fresh tail. Condensation is then driven by summarized-prefix pressure: the routine condensation phase obeys `sweepMaxDepth`, and if the summarized prefix still exceeds `summaryPrefixTargetTokens`, a pressure phase may use `condensedMinFanoutHard` and condense deeper. Total context pressure starts the sweep, but does not by itself force deeper condensation once the raw prefix has been summarized.
213
225
 
214
226
  ### Prompt-aware eviction
215
227
 
@@ -235,12 +247,12 @@ Compaction summarization resolves candidates in this order:
235
247
  1. `LCM_SUMMARY_MODEL` and `LCM_SUMMARY_PROVIDER`
236
248
  2. `plugins.entries.lossless-claw.config.summaryModel` and `summaryProvider`
237
249
  3. OpenClaw's default compaction model
238
- 4. Legacy per-call provider and model hints
250
+ 4. Runtime/session provider and model hints from OpenClaw
239
251
  5. `fallbackProviders`
240
252
 
241
253
  If `summaryModel` already contains a provider prefix such as `anthropic/claude-sonnet-4-20250514`, `summaryProvider` is ignored for that candidate.
242
254
 
243
- Runtime-managed OAuth providers are supported here too. In particular, `openai-codex` and `github-copilot` auth profiles can be used for summary and expansion calls without a separate API key.
255
+ Lossless does not resolve provider credentials directly for compaction summaries. OpenClaw's runtime LLM layer owns provider/model preparation, auth profiles, OAuth refresh, base URLs, and dispatch. Lossless only selects the requested summary target and passes it to the host runtime, where model override policy is enforced.
244
256
 
245
257
  A practical starting point for cost-sensitive setups is:
246
258
 
@@ -285,11 +297,12 @@ This keeps long-term history available while still giving users a real clean-sla
285
297
  Lossless-claw now defaults `proactiveThresholdCompactionMode` to `deferred`.
286
298
 
287
299
  - deferred mode records a single coalesced maintenance debt row per conversation
288
- - deferred mode persists provider/model/cache telemetry so Anthropic-family sessions can avoid rewriting a still-hot prompt cache
289
- - `maintain()` can still process non-prompt-mutating work when the host explicitly opts in to deferred execution, but it leaves prompt-mutating debt pending while Anthropic cache is still hot
290
- - `assemble()` consumes deferred prompt-mutating debt pre-assembly once the cache is cold or the next turn is already approaching overflow
300
+ - new deferred compaction debt is only created for `contextThreshold` pressure and uses reason `"threshold"`
301
+ - `maintain()` consumes threshold debt when the host explicitly opts in to deferred execution
302
+ - `assemble()` consumes pending threshold debt before building the next prompt
303
+ - old non-threshold debt from earlier builds is revalidated; if the conversation is no longer over threshold, it is cleared as a no-op
291
304
  - `/lcm status` / `/lossless status` shows the current maintenance state, including pending/running/last-failure details
292
- - status output also surfaces the latest API/cache telemetry so operators can see whether a deferred debt item is being preserved for cache-safety reasons
305
+ - status output also surfaces the latest API/cache telemetry as diagnostics, not as a deferral gate
293
306
  - set `proactiveThresholdCompactionMode` to `inline` only if you need the legacy inline proactive compaction behavior for compatibility
294
307
 
295
308
  ### `/lcm rotate`