@martian-engineering/lossless-claw 0.9.1 → 0.9.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +7 -2
- package/dist/index.js +100 -58
- package/docs/architecture.md +1 -1
- package/docs/configuration.md +23 -1
- package/docs/tui.md +47 -9
- package/openclaw.plugin.json +24 -0
- package/package.json +26 -4
- package/skills/lossless-claw/references/config.md +50 -0
package/docs/architecture.md
CHANGED
|
@@ -125,7 +125,7 @@ The assembler runs before each model turn and builds the message array:
|
|
|
125
125
|
2. Resolve each item — summaries become user messages with XML wrappers; messages are reconstructed from parts.
|
|
126
126
|
3. Split into evictable prefix and protected fresh tail (last `freshTailCount` raw messages).
|
|
127
127
|
4. Compute fresh tail token cost (always included, even if over budget).
|
|
128
|
-
5. Fill remaining budget from the evictable set
|
|
128
|
+
5. Fill remaining budget from the evictable set. By default this keeps newest older items and drops the oldest; when `promptAwareEviction` is enabled and a searchable prompt is present, the evictable prefix is ranked by prompt relevance first and then restored to chronological order.
|
|
129
129
|
6. Normalize assistant content to array blocks (Anthropic API compatibility).
|
|
130
130
|
7. Sanitize tool-use/result pairing (ensures every tool_result has a matching tool_use).
|
|
131
131
|
|
package/docs/configuration.md
CHANGED
|
@@ -23,6 +23,7 @@ Most installations only need to override a handful of keys. If you want a comple
|
|
|
23
23
|
"contextThreshold": 0.75,
|
|
24
24
|
"freshTailCount": 64,
|
|
25
25
|
"freshTailMaxTokens": 24000,
|
|
26
|
+
"promptAwareEviction": false,
|
|
26
27
|
"newSessionRetainDepth": 2,
|
|
27
28
|
"leafMinFanout": 8,
|
|
28
29
|
"condensedMinFanout": 4,
|
|
@@ -54,9 +55,12 @@ Most installations only need to override a handful of keys. If you want a comple
|
|
|
54
55
|
"proactiveThresholdCompactionMode": "deferred",
|
|
55
56
|
"cacheAwareCompaction": {
|
|
56
57
|
"enabled": true,
|
|
58
|
+
"cacheTTLSeconds": 300,
|
|
57
59
|
"maxColdCacheCatchupPasses": 2,
|
|
58
60
|
"hotCachePressureFactor": 4,
|
|
59
|
-
"hotCacheBudgetHeadroomRatio": 0.2
|
|
61
|
+
"hotCacheBudgetHeadroomRatio": 0.2,
|
|
62
|
+
"coldCacheObservationThreshold": 3,
|
|
63
|
+
"criticalBudgetPressureRatio": 0.70
|
|
60
64
|
},
|
|
61
65
|
"dynamicLeafChunkTokens": {
|
|
62
66
|
"enabled": true,
|
|
@@ -123,6 +127,7 @@ openclaw plugins install --link /path/to/lossless-claw
|
|
|
123
127
|
| `contextThreshold` | `number` | `0.75` | `LCM_CONTEXT_THRESHOLD` | Fraction of the active model context window that triggers compaction. |
|
|
124
128
|
| `freshTailCount` | `integer` | `64` | `LCM_FRESH_TAIL_COUNT` | Number of newest messages always kept raw. |
|
|
125
129
|
| `freshTailMaxTokens` | `integer` | unset | `LCM_FRESH_TAIL_MAX_TOKENS` | Optional token cap for the protected fresh tail. The newest message is always preserved even if it exceeds the cap. |
|
|
130
|
+
| `promptAwareEviction` | `boolean` | `false` | `LCM_PROMPT_AWARE_EVICTION_ENABLED` | When enabled, budget-constrained assembly keeps older evictable items by prompt relevance instead of pure chronology. This improves retrieval under tight budgets, but it can reduce prompt-cache hit rates because the preserved prefix changes as prompts change. |
|
|
126
131
|
| `leafMinFanout` | `integer` | `8` | `LCM_LEAF_MIN_FANOUT` | Minimum number of raw messages required before a leaf pass runs. |
|
|
127
132
|
| `condensedMinFanout` | `integer` | `4` | `LCM_CONDENSED_MIN_FANOUT` | Number of same-depth summaries needed before condensation is attempted. |
|
|
128
133
|
| `condensedMinFanoutHard` | `integer` | `2` | `LCM_CONDENSED_MIN_FANOUT_HARD` | Hard floor for condensation grouping during maintenance and repair flows. |
|
|
@@ -171,6 +176,7 @@ openclaw plugins install --link /path/to/lossless-claw
|
|
|
171
176
|
| `cacheAwareCompaction.hotCachePressureFactor` | `number` | `4` | `LCM_HOT_CACHE_PRESSURE_FACTOR` | Multiplier applied to the hot-cache leaf trigger before raw-history pressure overrides cache preservation. |
|
|
172
177
|
| `cacheAwareCompaction.hotCacheBudgetHeadroomRatio` | `number` | `0.2` | `LCM_HOT_CACHE_BUDGET_HEADROOM_RATIO` | Minimum fraction of the real token budget that must remain free before hot-cache incremental compaction is skipped entirely. |
|
|
173
178
|
| `cacheAwareCompaction.coldCacheObservationThreshold` | `integer` | `3` | `LCM_COLD_CACHE_OBSERVATION_THRESHOLD` | Consecutive cold observations required before non-explicit cache misses are treated as truly cold. This dampens one-off routing noise and provider failover blips. |
|
|
179
|
+
| `cacheAwareCompaction.criticalBudgetPressureRatio` | `number` | `0.70` | `LCM_CRITICAL_BUDGET_PRESSURE_RATIO` | Fraction of the token budget at which deferred compaction bypasses hot-cache delay so prompt-mutating debt can run before overflow. Set to `1` to disable this bypass. |
|
|
174
180
|
|
|
175
181
|
#### `dynamicLeafChunkTokens`
|
|
176
182
|
|
|
@@ -187,9 +193,25 @@ When cache-aware compaction is enabled:
|
|
|
187
193
|
- hot cache skips incremental maintenance entirely when the assembled context is still comfortably below the real token budget
|
|
188
194
|
- hot cache also gets a short hysteresis window so one ambiguous turn does not immediately discard a recently healthy cache signal
|
|
189
195
|
- cold cache still allows bounded catch-up passes via `cacheAwareCompaction.maxColdCacheCatchupPasses`
|
|
196
|
+
- once `currentTokenCount >= criticalBudgetPressureRatio * tokenBudget`, deferred compaction bypasses hot-cache delay so prompt-mutating debt can run before emergency overflow handling
|
|
190
197
|
|
|
191
198
|
When incremental leaf compaction still runs on a hot cache, follow-on condensed passes are suppressed so the maintenance cycle only pays for the leaf pass that was explicitly justified.
|
|
192
199
|
|
|
200
|
+
### Prompt-aware eviction
|
|
201
|
+
|
|
202
|
+
When `promptAwareEviction` is enabled:
|
|
203
|
+
|
|
204
|
+
- the protected fresh tail is still preserved exactly as usual
|
|
205
|
+
- only the older evictable prefix is affected
|
|
206
|
+
- if the evictable prefix does not fit and the current prompt has searchable terms, lossless-claw keeps the most relevant older items instead of just the newest older items
|
|
207
|
+
|
|
208
|
+
Tradeoff:
|
|
209
|
+
|
|
210
|
+
- this can improve retrieval quality when the prompt is asking about an older topic and the assembled context is tight
|
|
211
|
+
- it also makes the assembled prefix less stable for providers with prefix-based prompt caching, because different prompts can keep different older items
|
|
212
|
+
|
|
213
|
+
If Anthropic prompt-cache stability matters more than topical recall under pressure, set `promptAwareEviction: false`.
|
|
214
|
+
|
|
193
215
|
## Behavior notes
|
|
194
216
|
|
|
195
217
|
### Summary model resolution
|
package/docs/tui.md
CHANGED
|
@@ -237,6 +237,34 @@ The confirmation screen shows:
|
|
|
237
237
|
|
|
238
238
|
Each interactive operation also has a standalone CLI equivalent for scripting and batch operations.
|
|
239
239
|
|
|
240
|
+
### `lcm-tui doctor`
|
|
241
|
+
|
|
242
|
+
Scans for genuinely truncated summaries and can rewrite them in place. This is narrower than `repair`: it looks for specific truncation marker shapes instead of the generic fallback-summary marker.
|
|
243
|
+
|
|
244
|
+
```bash
|
|
245
|
+
# Preview repairs for one conversation
|
|
246
|
+
lcm-tui doctor 44 --show-diff
|
|
247
|
+
|
|
248
|
+
# Apply repairs through Codex CLI OAuth after `codex login`
|
|
249
|
+
lcm-tui doctor 44 --apply --provider openai-codex --model gpt-5.3-codex
|
|
250
|
+
|
|
251
|
+
# Scan only across every conversation
|
|
252
|
+
lcm-tui doctor --all
|
|
253
|
+
```
|
|
254
|
+
|
|
255
|
+
| Flag | Description |
|
|
256
|
+
|------|-------------|
|
|
257
|
+
| `--apply` | Write repaired summaries to the database |
|
|
258
|
+
| `--summary` | Scan only and show counts |
|
|
259
|
+
| `--all` | Scan all conversations (discovery mode only) |
|
|
260
|
+
| `--provider <id>` | API provider (default: anthropic) |
|
|
261
|
+
| `--model <model>` | API model (default: `claude-haiku-4-5`) |
|
|
262
|
+
| `--base-url <url>` | Custom API base URL (overrides config and env) |
|
|
263
|
+
| `--show-diff` | Show unified diff for each fix |
|
|
264
|
+
| `--timestamps` | Inject timestamps into rewrite source text |
|
|
265
|
+
|
|
266
|
+
Use `--provider openai-codex` when you want ChatGPT Plus/Pro OAuth from the Codex CLI. Keep `--provider openai` for direct OpenAI-compatible HTTP calls with a raw `OPENAI_API_KEY`, including custom `--base-url` proxies.
|
|
267
|
+
|
|
240
268
|
### `lcm-tui repair`
|
|
241
269
|
|
|
242
270
|
Finds and fixes corrupted summaries (those containing the `[LCM fallback summary]` marker from failed summarization attempts).
|
|
@@ -253,6 +281,12 @@ lcm-tui repair 44 --apply
|
|
|
253
281
|
|
|
254
282
|
# Repair a specific summary
|
|
255
283
|
lcm-tui repair 44 --summary-id sum_abc123 --apply
|
|
284
|
+
|
|
285
|
+
# Repair through Codex CLI OAuth after `codex login`
|
|
286
|
+
lcm-tui repair 44 --apply --provider openai-codex --model gpt-5.3-codex
|
|
287
|
+
|
|
288
|
+
# Repair through a custom OpenAI-compatible proxy with a raw API key
|
|
289
|
+
lcm-tui repair 44 --apply --provider openai --model gpt-5.3-codex --base-url https://proxy.example.com/openai
|
|
256
290
|
```
|
|
257
291
|
|
|
258
292
|
The repair process:
|
|
@@ -260,7 +294,7 @@ The repair process:
|
|
|
260
294
|
2. Orders them bottom-up: leaves first (in context ordinal order), then condensed nodes by ascending depth
|
|
261
295
|
3. Reconstructs source material from linked messages (leaves) or child summaries (condensed)
|
|
262
296
|
4. Resolves `previous_context` for each node (for deduplication in the prompt)
|
|
263
|
-
5. Sends to
|
|
297
|
+
5. Sends to the resolved provider API with the appropriate depth prompt
|
|
264
298
|
6. Updates the database in a single transaction
|
|
265
299
|
|
|
266
300
|
| Flag | Description |
|
|
@@ -268,6 +302,9 @@ The repair process:
|
|
|
268
302
|
| `--apply` | Write repairs to database (default: dry run) |
|
|
269
303
|
| `--all` | Scan all conversations |
|
|
270
304
|
| `--summary-id <id>` | Target a specific summary |
|
|
305
|
+
| `--provider <id>` | API provider (inferred from `--model` when omitted) |
|
|
306
|
+
| `--model <model>` | API model (default depends on provider) |
|
|
307
|
+
| `--base-url <url>` | Custom API base URL (overrides config and env) |
|
|
271
308
|
| `--verbose` | Show content hashes and previews |
|
|
272
309
|
|
|
273
310
|
### `lcm-tui rewrite`
|
|
@@ -284,10 +321,10 @@ lcm-tui rewrite 44 --depth 0 --apply
|
|
|
284
321
|
# Rewrite everything bottom-up
|
|
285
322
|
lcm-tui rewrite 44 --all --apply --diff
|
|
286
323
|
|
|
287
|
-
# Rewrite with
|
|
288
|
-
lcm-tui rewrite 44 --summary sum_abc123 --provider openai --model gpt-5.3-codex --apply
|
|
324
|
+
# Rewrite with Codex CLI OAuth after `codex login`
|
|
325
|
+
lcm-tui rewrite 44 --summary sum_abc123 --provider openai-codex --model gpt-5.3-codex --apply
|
|
289
326
|
|
|
290
|
-
# Rewrite through a custom OpenAI-compatible proxy
|
|
327
|
+
# Rewrite through a custom OpenAI-compatible proxy with a raw API key
|
|
291
328
|
lcm-tui rewrite 44 --summary sum_abc123 --provider openai --model gpt-5.3-codex --base-url https://proxy.example.com/openai --apply
|
|
292
329
|
|
|
293
330
|
# Use custom prompt templates
|
|
@@ -380,10 +417,10 @@ lcm-tui backfill my-agent session_abc123 --apply --recompact --single-root
|
|
|
380
417
|
# Import + compact + transplant into an active conversation
|
|
381
418
|
lcm-tui backfill my-agent session_abc123 --apply --transplant-to 653
|
|
382
419
|
|
|
383
|
-
# Backfill using
|
|
384
|
-
lcm-tui backfill my-agent session_abc123 --apply --provider openai --model gpt-5.3-codex
|
|
420
|
+
# Backfill using Codex CLI OAuth after `codex login`
|
|
421
|
+
lcm-tui backfill my-agent session_abc123 --apply --provider openai-codex --model gpt-5.3-codex
|
|
385
422
|
|
|
386
|
-
# Backfill through a custom OpenAI-compatible proxy
|
|
423
|
+
# Backfill through a custom OpenAI-compatible proxy with a raw API key
|
|
387
424
|
lcm-tui backfill my-agent session_abc123 --apply --provider openai --model gpt-5.3-codex --base-url https://proxy.example.com/openai
|
|
388
425
|
```
|
|
389
426
|
|
|
@@ -484,14 +521,15 @@ Resolution order:
|
|
|
484
521
|
|
|
485
522
|
If the provider auth profile mode is `oauth` (not `api_key`), set the provider API key environment variable explicitly.
|
|
486
523
|
|
|
487
|
-
|
|
524
|
+
Summary-producing operations (`doctor`, `repair`, `rewrite`, `backfill`, and interactive rewrite `w`/`W`) can be configured with:
|
|
488
525
|
- `LCM_TUI_SUMMARY_PROVIDER`
|
|
489
526
|
- `LCM_TUI_SUMMARY_MODEL`
|
|
490
527
|
- `LCM_TUI_SUMMARY_BASE_URL`
|
|
491
|
-
- `LCM_TUI_CONVERSATION_WINDOW_SIZE` (default `200`)
|
|
492
528
|
|
|
493
529
|
It also honors `LCM_SUMMARY_PROVIDER` / `LCM_SUMMARY_MODEL` / `LCM_SUMMARY_BASE_URL` as fallback.
|
|
494
530
|
|
|
531
|
+
Separately, the conversation browser window size uses `LCM_TUI_CONVERSATION_WINDOW_SIZE` (default `200`).
|
|
532
|
+
|
|
495
533
|
## Database
|
|
496
534
|
|
|
497
535
|
The TUI operates directly on the SQLite database at `~/.openclaw/lcm.db`. All write operations (rewrite, dissolve, repair, transplant, backfill) use transactions. Changes take effect on the next conversation turn — the running OpenClaw instance picks up database changes automatically.
|
package/openclaw.plugin.json
CHANGED
|
@@ -4,6 +4,14 @@
|
|
|
4
4
|
"skills": [
|
|
5
5
|
"skills/lossless-claw"
|
|
6
6
|
],
|
|
7
|
+
"contracts": {
|
|
8
|
+
"tools": [
|
|
9
|
+
"lcm_grep",
|
|
10
|
+
"lcm_describe",
|
|
11
|
+
"lcm_expand",
|
|
12
|
+
"lcm_expand_query"
|
|
13
|
+
]
|
|
14
|
+
},
|
|
7
15
|
"uiHints": {
|
|
8
16
|
"enabled": {
|
|
9
17
|
"label": "Enabled",
|
|
@@ -25,6 +33,10 @@
|
|
|
25
33
|
"label": "Fresh Tail Max Tokens",
|
|
26
34
|
"help": "Optional token cap for the protected fresh tail; the newest message is always preserved"
|
|
27
35
|
},
|
|
36
|
+
"promptAwareEviction": {
|
|
37
|
+
"label": "Prompt-Aware Eviction",
|
|
38
|
+
"help": "When enabled, budget-constrained assembly keeps older context by prompt relevance instead of pure chronology. This can improve recall under tight budgets, but it can also reduce provider prompt-cache hit rates because the preserved prefix changes as prompts change."
|
|
39
|
+
},
|
|
28
40
|
"leafChunkTokens": {
|
|
29
41
|
"label": "Leaf Chunk Tokens",
|
|
30
42
|
"help": "Maximum source tokens per leaf compaction chunk before summarization"
|
|
@@ -173,6 +185,10 @@
|
|
|
173
185
|
"label": "Cold Cache Observation Threshold",
|
|
174
186
|
"help": "Consecutive cold observations required before non-explicit cache misses are treated as truly cold"
|
|
175
187
|
},
|
|
188
|
+
"cacheAwareCompaction.criticalBudgetPressureRatio": {
|
|
189
|
+
"label": "Critical Budget Pressure Ratio",
|
|
190
|
+
"help": "Fraction of token budget at which deferred compaction fires regardless of prompt-cache state. Defaults to 0.70 — set to 1 to disable the override and let cache-aware throttling fully control deferral."
|
|
191
|
+
},
|
|
176
192
|
"dynamicLeafChunkTokens.enabled": {
|
|
177
193
|
"label": "Dynamic Leaf Chunk Tokens",
|
|
178
194
|
"help": "When enabled, incremental compaction uses a larger working leaf chunk in busy sessions and keeps the static floor in quieter sessions"
|
|
@@ -226,6 +242,9 @@
|
|
|
226
242
|
"type": "integer",
|
|
227
243
|
"minimum": 0
|
|
228
244
|
},
|
|
245
|
+
"promptAwareEviction": {
|
|
246
|
+
"type": "boolean"
|
|
247
|
+
},
|
|
229
248
|
"leafChunkTokens": {
|
|
230
249
|
"type": "integer",
|
|
231
250
|
"minimum": 1
|
|
@@ -363,6 +382,11 @@
|
|
|
363
382
|
"coldCacheObservationThreshold": {
|
|
364
383
|
"type": "integer",
|
|
365
384
|
"minimum": 1
|
|
385
|
+
},
|
|
386
|
+
"criticalBudgetPressureRatio": {
|
|
387
|
+
"type": "number",
|
|
388
|
+
"minimum": 0,
|
|
389
|
+
"maximum": 1
|
|
366
390
|
}
|
|
367
391
|
}
|
|
368
392
|
},
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@martian-engineering/lossless-claw",
|
|
3
|
-
"version": "0.9.
|
|
3
|
+
"version": "0.9.3",
|
|
4
4
|
"description": "Lossless Context Management plugin for OpenClaw — DAG-based conversation summarization with incremental compaction",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"main": "dist/index.js",
|
|
@@ -31,27 +31,49 @@
|
|
|
31
31
|
"LICENSE"
|
|
32
32
|
],
|
|
33
33
|
"dependencies": {
|
|
34
|
-
"@mariozechner/pi-agent-core": "*",
|
|
35
|
-
"@mariozechner/pi-ai": "*",
|
|
36
34
|
"@sinclair/typebox": "0.34.48"
|
|
37
35
|
},
|
|
38
36
|
"devDependencies": {
|
|
39
37
|
"@changesets/changelog-github": "^0.6.0",
|
|
40
38
|
"@changesets/cli": "^2.30.0",
|
|
39
|
+
"@mariozechner/pi-agent-core": "0.66.1",
|
|
40
|
+
"@mariozechner/pi-ai": "0.66.1",
|
|
41
|
+
"@mariozechner/pi-coding-agent": "0.66.1",
|
|
41
42
|
"esbuild": "^0.28.0",
|
|
42
43
|
"typescript": "^5.7.0",
|
|
43
44
|
"vitest": "^3.0.0"
|
|
44
45
|
},
|
|
45
46
|
"peerDependencies": {
|
|
47
|
+
"@mariozechner/pi-agent-core": "*",
|
|
48
|
+
"@mariozechner/pi-ai": "*",
|
|
49
|
+
"@mariozechner/pi-coding-agent": "*",
|
|
46
50
|
"openclaw": "*"
|
|
47
51
|
},
|
|
52
|
+
"peerDependenciesMeta": {
|
|
53
|
+
"@mariozechner/pi-agent-core": {
|
|
54
|
+
"optional": true
|
|
55
|
+
},
|
|
56
|
+
"@mariozechner/pi-ai": {
|
|
57
|
+
"optional": true
|
|
58
|
+
},
|
|
59
|
+
"@mariozechner/pi-coding-agent": {
|
|
60
|
+
"optional": true
|
|
61
|
+
}
|
|
62
|
+
},
|
|
48
63
|
"publishConfig": {
|
|
49
64
|
"access": "public"
|
|
50
65
|
},
|
|
51
66
|
"openclaw": {
|
|
52
67
|
"extensions": [
|
|
53
68
|
"./dist/index.js"
|
|
54
|
-
]
|
|
69
|
+
],
|
|
70
|
+
"compat": {
|
|
71
|
+
"pluginApi": ">=2026.2.17",
|
|
72
|
+
"minGatewayVersion": "2026.2.17"
|
|
73
|
+
},
|
|
74
|
+
"build": {
|
|
75
|
+
"openclawVersion": "2026.2.17"
|
|
76
|
+
}
|
|
55
77
|
},
|
|
56
78
|
"repository": {
|
|
57
79
|
"type": "git",
|
|
@@ -57,6 +57,21 @@ Good starting range:
|
|
|
57
57
|
- Leave unset unless large tool outputs are forcing avoidable cost or overflow.
|
|
58
58
|
- Start around `12000` to `32000` when you want a softer, size-aware fresh tail.
|
|
59
59
|
|
|
60
|
+
### `promptAwareEviction`
|
|
61
|
+
|
|
62
|
+
Controls whether budget-constrained assembly keeps older context by prompt relevance or pure chronology.
|
|
63
|
+
|
|
64
|
+
Why it matters:
|
|
65
|
+
|
|
66
|
+
- when enabled, lossless-claw can keep an older but on-topic summary instead of a newer irrelevant one
|
|
67
|
+
- this can improve retrieval quality when the assembled context is tight
|
|
68
|
+
- it also makes the preserved prompt prefix less stable, which can reduce prefix-based prompt-cache hit rates
|
|
69
|
+
|
|
70
|
+
Good default:
|
|
71
|
+
|
|
72
|
+
- `false`
|
|
73
|
+
- enable it only when topical older-context recall under tight budgets matters more than prompt-cache stability
|
|
74
|
+
|
|
60
75
|
### `leafChunkTokens`
|
|
61
76
|
|
|
62
77
|
Caps how much raw material gets summarized into one leaf summary.
|
|
@@ -88,12 +103,14 @@ Good defaults:
|
|
|
88
103
|
- `hotCachePressureFactor: 4`
|
|
89
104
|
- `hotCacheBudgetHeadroomRatio: 0.2`
|
|
90
105
|
- `coldCacheObservationThreshold: 3`
|
|
106
|
+
- `criticalBudgetPressureRatio: 0.70`
|
|
91
107
|
|
|
92
108
|
Operationally:
|
|
93
109
|
|
|
94
110
|
- hot cache stretches the incremental leaf trigger to `dynamicLeafChunkTokens.max`
|
|
95
111
|
- hot cache skips incremental maintenance entirely when the assembled context is comfortably below the real token budget
|
|
96
112
|
- hot cache gets a short hysteresis window so a recent cache hit stays "hot" briefly unless telemetry shows a break
|
|
113
|
+
- critical token-budget pressure bypasses hot-cache delay once the live prompt reaches `criticalBudgetPressureRatio * tokenBudget`
|
|
97
114
|
- if hot-cache maintenance still runs, it stays leaf-only and suppresses follow-on condensed passes
|
|
98
115
|
|
|
99
116
|
### `dynamicLeafChunkTokens`
|
|
@@ -232,6 +249,21 @@ See high-impact settings above.
|
|
|
232
249
|
|
|
233
250
|
See high-impact settings above.
|
|
234
251
|
|
|
252
|
+
### `promptAwareEviction`
|
|
253
|
+
|
|
254
|
+
Boolean toggle for prompt-sensitive selection inside the evictable prefix during assembly.
|
|
255
|
+
|
|
256
|
+
Why it matters:
|
|
257
|
+
|
|
258
|
+
- only applies when the older evictable prefix does not fit the token budget
|
|
259
|
+
- the protected fresh tail is unaffected
|
|
260
|
+
- `true` keeps the most relevant older items for the current prompt
|
|
261
|
+
- `false` falls back to pure chronological retention for the older prefix
|
|
262
|
+
|
|
263
|
+
Env override:
|
|
264
|
+
|
|
265
|
+
- `LCM_PROMPT_AWARE_EVICTION_ENABLED`
|
|
266
|
+
|
|
235
267
|
### `leafChunkTokens`
|
|
236
268
|
|
|
237
269
|
See high-impact settings above.
|
|
@@ -398,6 +430,24 @@ Default:
|
|
|
398
430
|
|
|
399
431
|
- `3`
|
|
400
432
|
|
|
433
|
+
#### `cacheAwareCompaction.criticalBudgetPressureRatio`
|
|
434
|
+
|
|
435
|
+
Fraction of the token budget at which deferred compaction bypasses hot-cache delay.
|
|
436
|
+
|
|
437
|
+
Why it matters:
|
|
438
|
+
|
|
439
|
+
- lets prompt-mutating deferred compaction run before the runtime falls back to emergency overflow handling
|
|
440
|
+
- preserves cache-aware throttling below the pressure threshold
|
|
441
|
+
- can be set to `1` to disable this pressure bypass
|
|
442
|
+
|
|
443
|
+
Default:
|
|
444
|
+
|
|
445
|
+
- `0.70`
|
|
446
|
+
|
|
447
|
+
Env override:
|
|
448
|
+
|
|
449
|
+
- `LCM_CRITICAL_BUDGET_PRESSURE_RATIO`
|
|
450
|
+
|
|
401
451
|
### `dynamicLeafChunkTokens`
|
|
402
452
|
|
|
403
453
|
#### `dynamicLeafChunkTokens.enabled`
|