npm - @martian-engineering/lossless-claw - Versions diffs - 0.9.4 → 0.11.0 - Mend

@martian-engineering/lossless-claw 0.9.4 → 0.11.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

package/README.md +10 -3
package/dist/index.js +329 -39
package/docs/agent-tools.md +6 -6
package/docs/architecture.md +17 -18
package/docs/compaction-redesign-map.md +243 -0
package/docs/configuration.md +40 -27
package/docs/focus-briefs-implementation-plan.md +240 -0
package/docs/tui.md +25 -1
package/doctor-contract-api.d.ts +34 -0
package/doctor-contract-api.js +349 -0
package/openclaw.plugin.json +54 -24
package/package.json +14 -21
package/skills/lossless-claw/references/config.md +114 -51
package/skills/lossless-claw/references/recall-tools.md +6 -0

package/docs/agent-tools.md CHANGED Viewed

@@ -32,7 +32,7 @@ Summaries are lossy by design. The "Expand for details about:" footer at the end
 Search across messages and/or summaries using regex or full-text search.
-Use `mode: "full_text"` for keyword or topical recall. Wrap exact multi-word phrases in quotes to preserve phrase matching. Keep the default `sort: "recency"` for recent events, switch to `sort: "relevance"` when looking for the best older match on a topic, and use `sort: "hybrid"` when you want relevance without giving up recency entirely.
+Use `mode: "full_text"` for keyword or topical recall. Full-text queries are not regexes: alternation (`A|B`), regex wildcards (`.*`), character classes (`[abc]`), and anchors (`^foo`, `foo$`) require `mode: "regex"`. Wrap exact multi-word phrases in quotes to preserve phrase matching. Keep the default `sort: "recency"` for recent events, switch to `sort: "relevance"` when looking for the best older match on a topic, and use `sort: "hybrid"` when you want relevance without giving up recency entirely.
 **Parameters:**
@@ -41,7 +41,7 @@ Use `mode: "full_text"` for keyword or topical recall. Wrap exact multi-word phr
 | `pattern` | string | ✅ | — | Search pattern |
 | `mode` | string | | `"regex"` | `"regex"` or `"full_text"` |
 | `scope` | string | | `"both"` | `"messages"`, `"summaries"`, or `"both"` |
-| `conversationId` | number | | current | Specific conversation to search |
+| `conversationId` | number | | current session family | Specific physical conversation to search |
 | `allConversations` | boolean | | `false` | Search all conversations |
 | `since` | string | | — | ISO timestamp lower bound |
 | `before` | string | | — | ISO timestamp upper bound |
@@ -81,7 +81,7 @@ Look up metadata and content for a specific summary or stored file.
 | Param | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
 | `id` | string | ✅ | — | `sum_xxx` for summaries, `file_xxx` for files |
-| `conversationId` | number | | current | Scope to a specific conversation |
+| `conversationId` | number | | current session family | Scope to a specific physical conversation |
 | `allConversations` | boolean | | `false` | Allow cross-conversation lookups |
 **Returns for summaries:**
@@ -124,7 +124,7 @@ When `allConversations: true` is set, `lcm_expand_query` can now synthesize one
 | `query` | string | ✅* | — | Text query to find summaries (if no `summaryIds`) |
 | `summaryIds` | string[] | ✅* | — | Specific summary IDs to expand (if no `query`) |
 | `maxTokens` | number | | 2000 | Answer length cap |
-| `conversationId` | number | | current | Scope to a specific conversation |
+| `conversationId` | number | | current session family | Scope to a specific physical conversation |
 | `allConversations` | boolean | | `false` | Search across all conversations |
 *One of `query` or `summaryIds` is required.
@@ -177,7 +177,7 @@ Add instructions to your agent's system prompt so it knows when to use LCM tools
 ## Memory & Context
 Use LCM tools for recall:
-1. `lcm_grep` — Search all conversations by keyword/regex. Prefer `mode: "full_text"` for topic recall, quote exact phrases, use `sort: "relevance"` for older-topic lookups, and `sort: "hybrid"` when recency should still matter.
+1. `lcm_grep` — Search all conversations by keyword/regex. Prefer `mode: "full_text"` for short topic terms, use `mode: "regex"` for alternation or other regex syntax, quote exact phrases, use `sort: "relevance"` for older-topic lookups, and `sort: "hybrid"` when recency should still matter.
 2. `lcm_describe` — Inspect a specific summary (cheap, no sub-agent)
 3. `lcm_expand_query` — Deep recall with bounded sub-agent expansion
@@ -187,7 +187,7 @@ listing something you need, use `lcm_expand_query` to get the full detail.
 ### Conversation scoping
-By default, tools operate on the current conversation. Use `lcm_grep(..., allConversations: true)` when you need broad global discovery. Use `lcm_expand_query(..., allConversations: true)` when you want bounded synthesis across sessions. Use `conversationId` when you already know the exact conversation to inspect or expand.
+By default, tools operate on the current session family: the active conversation plus archived segments that share the same stable session identity. This keeps recall continuous across session rotation and `/reset` replacement rows without widening the search to unrelated sessions. Use `lcm_grep(..., allConversations: true)` when you need broad global discovery. Use `lcm_expand_query(..., allConversations: true)` when you want bounded synthesis across sessions. Use `conversationId` when you already know the exact physical conversation to inspect or expand.
 ### Performance considerations

package/docs/architecture.md CHANGED Viewed

@@ -56,7 +56,7 @@ When OpenClaw processes a turn, it calls the context engine's lifecycle hooks:
 1. **bootstrap** — On session start, reconciles the JSONL session file with the LCM database. Imports any messages that exist in the file but not in LCM (crash recovery).
 2. **ingest** / **ingestBatch** — Persists new messages to the database and appends them to context_items.
-3. **afterTurn** — After the model responds, ingests new messages, then evaluates whether compaction should run.
+3. **afterTurn** — After the model responds, ingests new messages, then evaluates whether `contextThreshold` requires compaction.
 ### Leaf compaction
@@ -66,9 +66,9 @@ The **leaf pass** converts raw messages into leaf summaries:
 2. Cap the chunk at `leafChunkTokens` (default 20k tokens).
 3. Concatenate message content with timestamps.
 4. Resolve the most recent prior summary for continuity (passed as `previous_context` so the LLM avoids repeating known information).
-5. Send to the LLM with the leaf prompt.
-6. Normalize provider response blocks (Anthropic/OpenAI text, output_text, and nested content/summary shapes) into plain text.
-7. If normalization is empty, log provider/model/block-type diagnostics and fall back to deterministic truncation.
+5. Send to OpenClaw's host-owned `runtime.llm.complete` capability with the leaf prompt.
+6. Normalize runtime LLM response text into plain text while preserving provider/model diagnostics from the host result.
+7. If normalization is empty, log provider/model diagnostics and fall back to deterministic truncation.
 8. If the summary is larger than the input (LLM failure), retry with the aggressive prompt. If still too large, fall back to deterministic truncation.
 9. Persist the summary, link to source messages, and replace the message range in context_items.
@@ -84,15 +84,16 @@ The **condensed pass** merges summaries at the same depth into a higher-level su
 ### Compaction modes
-**Incremental (after each turn):**
-- Checks if raw tokens outside the fresh tail exceed `leafChunkTokens`
-- If so, runs one leaf pass
-- If `incrementalMaxDepth != 0`, follows with condensation passes up to that depth (`-1` for unlimited)
-- Best-effort: failures don't break the conversation
+**Automatic threshold sweep (after each turn):**
+- Checks if the assembled context crosses `contextThreshold`
+- Below threshold, does not compact and does not record leaf debt
+- In deferred mode, records one `"threshold"` maintenance row for background, `maintain()`, or pre-assembly execution
+- In inline mode, runs a full sweep before `afterTurn()` completes
-**Full sweep (manual `/compact` or overflow):**
+**Full sweep (threshold, manual `/compact`, or overflow):**
 - Phase 1: Repeatedly runs leaf passes until no more eligible chunks
-- Phase 2: Repeatedly runs condensation passes starting from the shallowest eligible depth
+- Phase 2: If the summarized prefix is above `summaryPrefixTargetTokens`, repeatedly runs condensation passes starting from the shallowest eligible depth, respecting the preferred `sweepMaxDepth` (`0` for leaf-only, `-1` for unlimited)
+- Pressure phase: If summarized-prefix pressure remains, condensation may go beyond `sweepMaxDepth` using the hard fanout floor
 - Each pass checks for progress; stops if no tokens were saved
 **Budget-targeted (`compactUntilUnder`):**
@@ -206,6 +207,8 @@ LCM handles crash recovery through **bootstrap reconciliation**:
 2. Compare against the LCM database.
 3. Find the most recent message that exists in both (the "anchor").
 4. Import any messages after the anchor that are in JSONL but not in LCM.
+5. If an existing session key moves to a different transcript file and no anchor exists, treat the new file as a bounded transcript epoch and import its recoverable messages. The same flood cap used for tail reconciliation prevents large unrelated transcripts from being appended automatically.
+6. Advance the bootstrap checkpoint only after an overlap is found or a bounded epoch import succeeds. No-anchor reads that import nothing leave the old checkpoint in place so a later turn can retry.
 This handles the case where OpenClaw wrote messages to the session file but crashed before LCM could persist them.
@@ -213,12 +216,8 @@ This handles the case where OpenClaw wrote messages to the session file but cras
 All mutating operations (ingest, compact) are serialized per-session using a promise queue. This prevents races between concurrent afterTurn/compact calls for the same conversation without blocking operations on different conversations.
-## Authentication
+## Runtime LLM boundary
-LCM needs to call an LLM for summarization. It resolves credentials through a three-tier cascade:
+LCM needs model inference for summarization, but it does not resolve provider credentials, base URLs, or provider transport settings directly. Summarization calls go through OpenClaw's `runtime.llm.complete` capability, which owns model preparation, credential resolution, OAuth refresh, provider dispatch, and usage attribution.
-1. **Auth profiles** — OpenClaw's OAuth/token/API-key profile system (`auth-profiles.json`), checked in priority order
-2. **Environment variables** — Standard provider env vars (`ANTHROPIC_API_KEY`, etc.)
-3. **Custom provider key** — From models config (e.g., `models.json`)
-For OAuth providers (e.g., Anthropic via Claude Max), LCM handles token refresh and credential persistence automatically.
+Configured Lossless summary model overrides (`summaryModel`, `largeFileSummaryModel`, and `fallbackProviders`) are sent as runtime LLM model override requests. OpenClaw enforces those requests with `plugins.entries.lossless-claw.llm.allowModelOverride` and `plugins.entries.lossless-claw.llm.allowedModels`; denied overrides fail closed instead of silently falling back to a different model.

package/docs/compaction-redesign-map.md ADDED Viewed

@@ -0,0 +1,243 @@
+# Compaction Redesign Map
+Status: implementation pass
+Date: 2026-05-14
+Branch: `josh/compaction-redesign`
+## Goal
+Lossless should stop trying to infer whether a provider prompt cache is hot or cold before deciding whether to compact. The old cache-aware incremental strategy could not be made sound with the signals available to Lossless: by the time a low cache-read observation arrives, the provider has usually already rewritten the cache for that turn. Without a reliable `expiresAt` signal, Lossless cannot safely tell "cold before this turn" from "cold during this turn, now hot again."
+The new design is intentionally simpler:
+1. Do not run automatic incremental compaction from raw-history pressure.
+2. Let context grow in the assembled transcript until the configured threshold is crossed.
+3. When `contextThreshold` is crossed, run the existing full-sweep mechanism.
+4. Keep the fresh tail as the protected boundary for recent verbatim context.
+5. Reuse existing summary sizing and fanout configuration.
+6. Use a summarized-prefix pressure target only as the escape hatch when a preferred-depth sweep does not reduce enough context.
+## Implemented Decisions
+| Decision | Outcome |
+| --- | --- |
+| Automatic compaction trigger | `contextThreshold` only. |
+| Raw leaf trigger | Kept as a diagnostic/manual helper; removed from automatic scheduling. |
+| Deferred debt | New automatic debt uses only reason `"threshold"`. |
+| Cache hotness | No longer delays automatic threshold compaction. |
+| Legacy non-threshold debt | Revalidated against threshold, then swept or marked finished as obsolete. |
+| Full sweep trigger | No longer starts only because `evaluateLeafTrigger()` is true. |
+| Full sweep preferred depth | `compactFullSweep()` now respects `sweepMaxDepth` during routine condensation. |
+| Fresh tail | Kept. It remains independent of incremental compaction. |
+| Default leaf chunk size | Kept at 20k tokens. |
+| Deprecated depth key | `incrementalMaxDepth` remains accepted as an alias for `sweepMaxDepth`. |
+| Pressure escape hatch | `summaryPrefixTargetTokens` lets sweeps condense beyond the preferred depth when summarized context remains too large. |
+| `cacheAwareCompaction.*` | Still visible and accepted, but documented as deprecated compatibility config. |
+| `dynamicLeafChunkTokens.*` | Still visible and accepted, but documented as deprecated compatibility config. |
+| Engine `compactLeafAsync()` | Removed. Automatic and public engine compaction should go through threshold/full-sweep paths. |
+| `CompactionEngine.compactLeaf()` | Kept as a lower-level helper and for focused tests. |
+| Stable hot-cache orphan stripping | Removed with the cache-state-dependent assembly behavior. |
+## Current Lifecycle
+### Ingestion and After-Turn Scheduling
+`LcmContextEngine.afterTurn()` now follows one automatic policy:
+```text
+afterTurn -> ingest messages -> update telemetry -> evaluate contextThreshold
+if below threshold:
+  do not compact
+  do not record maintenance debt
+if threshold is crossed and mode is inline:
+  run threshold full sweep inline
+if threshold is crossed and mode is deferred:
+  record one threshold maintenance row
+  schedule the background drain
+```
+The raw-history leaf trigger is no longer part of this lifecycle. `evaluateLeafTrigger()` can still answer "is there enough old raw material for a leaf pass?", but that answer does not cause automatic maintenance.
+Relevant code:
+- `src/engine.ts`: `afterTurn()`
+- `src/engine.ts`: `recordDeferredCompactionDebt()`
+- `src/store/compaction-maintenance-store.ts`: one coalesced maintenance row per conversation
+- `src/compaction.ts`: `evaluateLeafTrigger()`
+### Deferred Debt
+Deferred maintenance still exists because threshold sweeps can be expensive and should often happen outside the critical response path.
+New automatic debt should always use:
+```text
+reason = "threshold"
+```
+When the debt drains, Lossless calls threshold full sweep via `executeCompactionCore({ compactionTarget: "threshold" })`. Prompt-cache telemetry and TTLs are not consulted. Session queue idleness remains relevant because compaction should not race active session work.
+Old databases may contain pending non-threshold debt from previous builds. The compatibility behavior is:
+- re-evaluate `contextThreshold` at consumption time
+- if the conversation is over threshold, run threshold full sweep
+- if it is under threshold, mark the old debt finished with a no-op legacy reason
+This clears obsolete maintenance rows without deleting persisted conversation data.
+Relevant code:
+- `src/engine.ts`: `drainDeferredCompactionDebtNow()`
+- `src/engine.ts`: `consumeDeferredCompactionDebt()`
+- `src/engine.ts`: `maintain()`
+- `src/engine.ts`: pre-assembly maintenance drain
+### Full Sweep
+`CompactionEngine.compact()` delegates to `compactFullSweep()`.
+The sweep has two phases:
+1. Leaf phase: repeatedly summarize the oldest raw chunks outside the fresh tail.
+2. Condensed phase: if summarized-prefix tokens exceed `summaryPrefixTargetTokens`, repeatedly summarize same-depth summary chunks, shallowest first.
+Routine threshold sweeps use `contextThreshold` to decide when to start compaction. Once started, the leaf phase runs until no eligible raw-message chunk remains outside the fresh tail. Condensation is controlled by `summaryPrefixTargetTokens`, not by total context pressure. Forced sweeps still stop when no eligible chunk remains or when a pass stops making token progress.
+`sweepMaxDepth` is the preferred source-depth cap for routine full-sweep condensation:
+- `0`: leaf summaries only
+- `1`: depth-0 summaries may condense into depth 1, then stop
+- `2`: depth 0 -> 1 and depth 1 -> 2 are allowed
+- `-1`: unlimited
+The cap is intentionally aspirational. If summary tokens outside the fresh tail exceed `summaryPrefixTargetTokens` after routine condensation, Lossless runs a pressure condensation phase that may go deeper using `condensedMinFanoutHard`.
+Relevant code:
+- `src/compaction.ts`: `compactFullSweep()`
+- `src/compaction.ts`: `selectOldestLeafChunk()`
+- `src/compaction.ts`: `selectShallowestCondensationCandidate()`
+- `src/compaction.ts`: `resolveSweepMaxDepth()`
+- `src/compaction.ts`: `resolveSummaryPrefixTargetTokens()`
+### Fresh Tail
+The fresh tail is not incremental compaction. It stays because it protects recent verbatim context and gives both assembly and compaction a stable boundary.
+The fresh tail:
+- is always included during assembly
+- is excluded from leaf summarization
+- may be capped by `freshTailMaxTokens`
+- still preserves the newest message even when that one message exceeds the cap
+Relevant code:
+- `src/assembler.ts`: `resolveFreshTailOrdinal()`
+- `src/assembler.ts`: `Assembler.assemble()`
+- `src/compaction.ts`: `resolveFreshTailOrdinal()`
+- `src/compaction.ts`: `countRawTokensOutsideFreshTail()`
+## Removed Automatic Policy
+The old `evaluateIncrementalCompaction()` path combined:
+- prompt-cache telemetry
+- hot/cold/unknown cache-state heuristics
+- cache TTL guesses
+- dynamic leaf chunk sizing
+- raw-history pressure outside the fresh tail
+- bounded cold-cache catch-up
+- hot-cache leaf-only behavior
+- budget-headroom gates
+That policy is removed from automatic scheduling. The important reason is not that each individual heuristic was unreasonable; it is that the combined decision depended on cache state that Lossless cannot reliably observe at the time it must decide whether to mutate the prompt prefix.
+## Config Semantics
+### Active Settings
+| Key | Role |
+| --- | --- |
+| `contextThreshold` | The only automatic compaction trigger. |
+| `proactiveThresholdCompactionMode` | Chooses inline vs deferred threshold full sweep. |
+| `freshTailCount` | Protects newest raw messages during assembly and compaction. |
+| `freshTailMaxTokens` | Optional cap for protected fresh-tail size. |
+| `leafChunkTokens` | Maximum raw material per leaf summary during sweep; default remains 20k. |
+| `leafMinFanout` | Minimum raw-message or depth-0 summary fanout for useful compaction. |
+| `condensedMinFanout` | Normal same-depth condensation grouping for depth 1+. |
+| `condensedMinFanoutHard` | Hard-trigger/repair condensation grouping. |
+| `sweepMaxDepth` | Preferred source-depth cap for routine threshold full sweep. |
+| `summaryPrefixTargetTokens` | Optional target for summarized-prefix tokens; pressure condensation may go deeper if this target is missed. |
+| `leafTargetTokens` | Leaf summary target. |
+| `condensedTargetTokens` | Condensed summary target. |
+### Deprecated Compatibility Settings
+| Key | Status |
+| --- | --- |
+| `incrementalMaxDepth` | Accepted as a deprecated alias for `sweepMaxDepth`. New config should use `sweepMaxDepth`. |
+| `cacheAwareCompaction.*` | Accepted and visible as deprecated config. It no longer changes automatic compaction decisions. |
+| `dynamicLeafChunkTokens.*` | Accepted and visible as deprecated config. Automatic compaction uses `leafChunkTokens` directly. |
+Keeping these settings visible avoids breaking existing OpenClaw config and gives operators an explicit deprecation signal instead of silently hiding known keys.
+## Stable Orphan Stripping Tradeoff
+The old cache-aware assembly path could preserve a stable hot-cache boundary by overriding tool-call orphan stripping at a previously observed ordinal. This was removed with the rest of the cache-state-dependent assembly behavior.
+Benefits of removal:
+- the assembled prompt no longer changes based on inferred cache hotness
+- assembly has fewer hidden stateful branches
+- prompt-prefix behavior is easier to reason about and test
+- cache telemetry remains diagnostic instead of controlling prompt mutation
+Cost of removal:
+- Lossless gives up one cache-oriented prefix-stability optimization for tool-call boundaries
+- in some hot-cache sessions, ordinary tool-pair repair may alter the prefix sooner than the old stable-boundary override would have
+The ordinary assembler still sanitizes tool-use/tool-result pairing, so this is a cache-efficiency tradeoff rather than a transcript-correctness tradeoff.
+## Test Coverage
+The implementation should cover:
+- below-threshold turns do not compact and do not record debt
+- threshold crossings record only `"threshold"` debt in deferred mode
+- inline mode runs threshold full sweep rather than leaf-trigger compaction
+- background drain consumes threshold debt without prompt-cache telemetry or TTL
+- `maintain()` consumes threshold debt without prompt-cache delay
+- pre-assembly drain consumes threshold debt without prompt-cache delay
+- legacy non-threshold debt is cleared when threshold no longer applies
+- legacy non-threshold debt is upgraded to threshold full sweep when threshold still applies
+- `compactFullSweep()` treats `sweepMaxDepth` as a preferred depth
+- `compactFullSweep()` pressure-condenses past `sweepMaxDepth` when threshold or summary-prefix pressure remains
+- the fresh tail remains verbatim and un-compacted
+Removed or rewritten coverage:
+- hot-cache delay gate tests
+- cold-cache catch-up tests
+- dynamic automatic leaf chunk tests
+- automatic leaf debt tests
+- engine-level `compactLeafAsync()` tests
+- stable hot-cache orphan-stripping tests
+## Non-Goals
+- Do not add a total-context target floor in this pass.
+- Do not remove persisted telemetry or maintenance tables.
+- Do not parallelize full-sweep leaf summaries yet. The current leaf prompt uses prior summary continuity, so parallelization would require a separate semantic design.
+- Do not depend on provider cache `expiresAt`.
+- Do not remove accepted deprecated config keys until a separate migration decision is made.
+## Follow-Up Watch Items
+1. If repeated threshold re-entry happens in live use, tune `summaryPrefixTargetTokens`, `contextThreshold`, `leafChunkTokens`, and fanout before adding a total-context target floor.
+2. If 20k leaf chunks make threshold sweeps too frequent, consider 30k before adding new mechanisms.
+3. If stable orphan stripping removal causes measurable cache regressions in tool-heavy sessions, revisit it as an assembly feature independent of cache-hotness inference.

package/docs/configuration.md CHANGED Viewed

@@ -24,12 +24,15 @@ Most installations only need to override a handful of keys. If you want a comple
   "freshTailCount": 64,
   "freshTailMaxTokens": 24000,
   "promptAwareEviction": false,
+  "stubLargeToolPayloads": false,
   "newSessionRetainDepth": 2,
   "leafMinFanout": 8,
   "condensedMinFanout": 4,
   "condensedMinFanoutHard": 2,
+  "sweepMaxDepth": 1,
   "incrementalMaxDepth": 1,
   "leafChunkTokens": 20000,
+  "summaryPrefixTargetTokens": 20000,
   "bootstrapMaxTokens": 6000,
   "leafTargetTokens": 2400,
   "condensedTargetTokens": 2000,
@@ -55,6 +58,7 @@ Most installations only need to override a handful of keys. If you want a comple
   "proactiveThresholdCompactionMode": "deferred",
   "autoRotateSessionFiles": {
     "enabled": true,
+    "createBackups": false,
     "sizeBytes": 2097152,
     "startup": "rotate",
     "runtime": "rotate"
@@ -66,7 +70,7 @@ Most installations only need to override a handful of keys. If you want a comple
     "hotCachePressureFactor": 4,
     "hotCacheBudgetHeadroomRatio": 0.2,
     "coldCacheObservationThreshold": 3,
-    "criticalBudgetPressureRatio": 0.70
+    "criticalBudgetPressureRatio": 0.90
   },
   "dynamicLeafChunkTokens": {
     "enabled": true,
@@ -82,6 +86,7 @@ Notes on the example:
 - `largeFilesDir` shows the expanded default path shape. Both `databasePath` and `largeFilesDir` default to paths under `OPENCLAW_STATE_DIR` (which in turn falls back to `~/.openclaw`).
 - `timezone` has no fixed hardcoded default; at runtime it resolves from `TZ` first, then the system timezone. The example uses `America/Los_Angeles`.
 - `maxAssemblyTokenBudget` has no default. The example uses `30000` as a realistic cap for a 32k-class model.
+- `summaryPrefixTargetTokens` has no fixed default. The example uses `20000`, which matches the derived default for large-context models with the default `leafChunkTokens`.
 - `databasePath` is the preferred key. `dbPath` is an accepted alias.
 - `largeFileThresholdTokens` is the preferred key. `largeFileTokenThreshold` is an accepted alias.
@@ -124,13 +129,14 @@ openclaw plugins install --link /path/to/lossless-claw
 | `transcriptGcEnabled` | `boolean` | `false` | `LCM_TRANSCRIPT_GC_ENABLED` | Enables transcript rewrite GC during `maintain()`; disabled by default so transcript rewrites stay opt-in. |
 | `proactiveThresholdCompactionMode` | `"deferred" \| "inline"` | `"deferred"` | `LCM_PROACTIVE_THRESHOLD_COMPACTION_MODE` | Controls whether proactive threshold compaction is deferred into maintenance debt by default or run inline for legacy behavior. |
 | `autoRotateSessionFiles.enabled` | `boolean` | `true` | `LCM_AUTO_ROTATE_SESSION_FILES_ENABLED` | Enables automatic rotation for oversized LCM-managed session JSONL files. |
+| `autoRotateSessionFiles.createBackups` | `boolean` | `false` | `LCM_AUTO_ROTATE_SESSION_FILES_CREATE_BACKUPS` | Creates or replaces the rolling `rotate-latest` SQLite backup before automatic session-file rotation. Manual `/lcm rotate` backups are always created. |
 | `autoRotateSessionFiles.sizeBytes` | `integer` | `2097152` | `LCM_AUTO_ROTATE_SESSION_FILES_SIZE_BYTES` | Byte threshold that triggers automatic session-file rotation. |
 | `autoRotateSessionFiles.startup` | `"rotate" \| "warn" \| "off"` | `"rotate"` | `LCM_AUTO_ROTATE_SESSION_FILES_STARTUP` | Startup behavior for oversized indexed OpenClaw session transcripts that also have active LCM bootstrap state. |
 | `autoRotateSessionFiles.runtime` | `"rotate" \| "warn" \| "off"` | `"rotate"` | `LCM_AUTO_ROTATE_SESSION_FILES_RUNTIME` | Runtime behavior after `afterTurn()` and `maintain()` check the current transcript size. |
 > **Multi-profile note:** `OPENCLAW_STATE_DIR` (set by the host OpenClaw gateway) controls where state is stored. When two gateways run on the same host (e.g. separate bot personas), each gateway sets its own `OPENCLAW_STATE_DIR` and lossless-claw automatically uses that directory for the database, large-file payloads, auth-profile lookups, and legacy secrets — no per-profile plugin config is needed.
-Automatic session-file rotation uses the same safe path as `/lcm rotate`: runtime rotation replaces the rolling `rotate-latest` SQLite backup, rewrites only the live session transcript, keeps the active LCM conversation and durable history intact, and refreshes the bootstrap checkpoint. Startup rotation first scans OpenClaw's current indexed session stores for configured agents, then intersects those candidates with active LCM conversations and matching bootstrap file mappings. If multiple startup candidates need rotation, one pre-rotation LCM database backup is created for the batch before any transcript is rewritten. Rotation never runs for ignored sessions, stateless sessions, or sessions without active LCM state. The preserved JSONL tail follows the existing rotate behavior, which is controlled by `freshTailCount`.
+Automatic session-file rotation rewrites only the live session transcript, keeps the active LCM conversation and durable history intact, and refreshes the bootstrap checkpoint. Startup rotation first scans OpenClaw's current indexed session stores for configured agents, then intersects those candidates with active LCM conversations and matching bootstrap file mappings. Automatic rotation does not create a SQLite backup by default; set `autoRotateSessionFiles.createBackups` to `true` to make runtime rotation replace the rolling `rotate-latest` backup and to make startup rotation create one pre-rotation LCM database backup for the batch before any transcript is rewritten. Manual `/lcm rotate` always keeps its backup-backed behavior regardless of this flag. Rotation never runs for ignored sessions, stateless sessions, or sessions without active LCM state. The preserved JSONL tail follows the existing rotate behavior, which is controlled by `freshTailCount`.
 Every automatic decision emits grep-able log lines prefixed with `[lcm] auto-rotate:`. Startup emits one compact summary line with `phase=startup`, `action=summary`, `scanned`, `eligible`, `rotated`, `warned`, `skipped`, `durationMs`, `bytesRemoved`, and backup fields when a batch backup was created; quiet skips such as missing files, missing bootstrap mappings, and below-threshold files are counted there instead of producing one line per candidate. Rotation detail lines include `phase`, `action`, `sessionId`, `sessionKey`, `sessionFile`, `sizeBytes`, `thresholdBytes`, `durationMs`, `backupPath`, `bytesRemoved`, `preservedTailMessageCount`, and `checkpointSize`; real warning lines include the same available context plus `reason` or `error`.
@@ -142,11 +148,14 @@ Every automatic decision emits grep-able log lines prefixed with `[lcm] auto-rot
 | `freshTailCount` | `integer` | `64` | `LCM_FRESH_TAIL_COUNT` | Number of newest messages always kept raw. |
 | `freshTailMaxTokens` | `integer` | unset | `LCM_FRESH_TAIL_MAX_TOKENS` | Optional token cap for the protected fresh tail. The newest message is always preserved even if it exceeds the cap. |
 | `promptAwareEviction` | `boolean` | `false` | `LCM_PROMPT_AWARE_EVICTION_ENABLED` | When enabled, budget-constrained assembly keeps older evictable items by prompt relevance instead of pure chronology. This improves retrieval under tight budgets, but it can reduce prompt-cache hit rates because the preserved prefix changes as prompts change. |
+| `stubLargeToolPayloads` | `boolean` | `false` | `LCM_STUB_LARGE_TOOL_PAYLOADS` | When enabled, evictable tool-result rows backfilled with `messages.large_content` are assembled as `[LCM Tool Output: file_xxx ...]` stubs while the fresh tail stays inline. Requires `scripts/lcm-blob-migrate.mjs`, which defaults to the same large-files root as runtime LCM (`LCM_LARGE_FILES_DIR` or `${OPENCLAW_STATE_DIR}/lcm-files`). |
 | `leafMinFanout` | `integer` | `8` | `LCM_LEAF_MIN_FANOUT` | Minimum number of raw messages required before a leaf pass runs. |
 | `condensedMinFanout` | `integer` | `4` | `LCM_CONDENSED_MIN_FANOUT` | Number of same-depth summaries needed before condensation is attempted. |
 | `condensedMinFanoutHard` | `integer` | `2` | `LCM_CONDENSED_MIN_FANOUT_HARD` | Hard floor for condensation grouping during maintenance and repair flows. |
-| `incrementalMaxDepth` | `integer` | `1` | `LCM_INCREMENTAL_MAX_DEPTH` | Maximum automatic condensation depth after leaf compaction. Use `0` for leaf-only and `-1` for unlimited depth. |
-| `leafChunkTokens` | `integer` | `20000` | `LCM_LEAF_CHUNK_TOKENS` | Maximum source-token budget for a leaf compaction chunk. |
+| `sweepMaxDepth` | `integer` | `1` | `LCM_SWEEP_MAX_DEPTH` | Preferred maximum condensation source depth during routine threshold sweeps. Use `0` for leaf-only and `-1` for unlimited depth. Pressure sweeps may go deeper when summarized context remains above target. |
+| `incrementalMaxDepth` | `integer` | alias of `sweepMaxDepth` | `LCM_INCREMENTAL_MAX_DEPTH` | Deprecated alias for `sweepMaxDepth`. Kept so existing configs continue to load. |
+| `leafChunkTokens` | `integer` | `20000` | `LCM_LEAF_CHUNK_TOKENS` | Maximum source-token budget for a leaf compaction chunk. Larger chunks reduce sweep frequency at the cost of slower individual summary calls. |
+| `summaryPrefixTargetTokens` | `integer` | derived | `LCM_SUMMARY_PREFIX_TARGET_TOKENS` | Optional target for summarized-prefix tokens after a full sweep. If unset, Lossless derives `max(condensedTargetTokens, min(leafChunkTokens, floor(contextThreshold * tokenBudget * 0.5)))`. |
 | `bootstrapMaxTokens` | `integer` | `max(6000, floor(leafChunkTokens * 0.3))` | `LCM_BOOTSTRAP_MAX_TOKENS` | Maximum parent-history tokens imported when a new LCM conversation bootstraps. |
 | `leafTargetTokens` | `integer` | `2400` | `LCM_LEAF_TARGET_TOKENS` | Prompt target for leaf summary size. |
 | `condensedTargetTokens` | `integer` | `2000` | `LCM_CONDENSED_TARGET_TOKENS` | Prompt target for condensed summary size. |
@@ -170,6 +179,8 @@ Every automatic decision emits grep-able log lines prefixed with `[lcm] auto-rot
 | `summaryTimeoutMs` | `integer` | `60000` | `LCM_SUMMARY_TIMEOUT_MS` | Maximum time to wait for one model-backed summarizer call. |
 | `customInstructions` | `string` | `""` | `LCM_CUSTOM_INSTRUCTIONS` | Extra natural-language instructions injected into every summarization prompt. |
+Summary calls are executed through OpenClaw's `api.runtime.llm.complete` capability. If you configure an explicit Lossless summary model (`summaryModel`, `largeFileSummaryModel`, or `fallbackProviders`), OpenClaw must allow that runtime LLM override under `plugins.entries.lossless-claw.llm.allowModelOverride` and `plugins.entries.lossless-claw.llm.allowedModels`. `openclaw doctor --fix` can add the minimal policy entries for configured Lossless summary models. Delegated expansion calls use OpenClaw's runtime sub-agent layer; explicit `expansionModel` values require `plugins.entries.lossless-claw.subagent.allowModelOverride` and a matching `subagent.allowedModels` entry, or `"*"` if you intentionally trust any expansion target. `openclaw doctor --fix` can add the minimal subagent policy, and `lcm_expand_query` retries once without the override if the host rejects it.
 ### Fallbacks, circuit breaking, and safety rails
 | Key | Type | Default | Env override | Purpose |
@@ -184,32 +195,33 @@ Every automatic decision emits grep-able log lines prefixed with `[lcm] auto-rot
 | Key | Type | Default | Env override | Purpose |
 | --- | --- | --- | --- | --- |
-| `cacheAwareCompaction.enabled` | `boolean` | `true` | `LCM_CACHE_AWARE_COMPACTION_ENABLED` | Defers incremental leaf compaction more aggressively when prompt-cache telemetry indicates a hot cache. |
-| `cacheAwareCompaction.cacheTTLSeconds` | `integer` | `300` | `LCM_CACHE_TTL_SECONDS` | Fallback cache TTL used when deferred Anthropic compaction has provider/model telemetry but no explicit runtime cache-retention window. |
-| `cacheAwareCompaction.maxColdCacheCatchupPasses` | `integer` | `2` | `LCM_MAX_COLD_CACHE_CATCHUP_PASSES` | Maximum bounded catch-up passes allowed in one maintenance cycle when cache telemetry is cold. |
-| `cacheAwareCompaction.hotCachePressureFactor` | `number` | `4` | `LCM_HOT_CACHE_PRESSURE_FACTOR` | Multiplier applied to the hot-cache leaf trigger before raw-history pressure overrides cache preservation. |
-| `cacheAwareCompaction.hotCacheBudgetHeadroomRatio` | `number` | `0.2` | `LCM_HOT_CACHE_BUDGET_HEADROOM_RATIO` | Minimum fraction of the real token budget that must remain free before hot-cache incremental compaction is skipped entirely. |
-| `cacheAwareCompaction.coldCacheObservationThreshold` | `integer` | `3` | `LCM_COLD_CACHE_OBSERVATION_THRESHOLD` | Consecutive cold observations required before non-explicit cache misses are treated as truly cold. This dampens one-off routing noise and provider failover blips. |
-| `cacheAwareCompaction.criticalBudgetPressureRatio` | `number` | `0.70` | `LCM_CRITICAL_BUDGET_PRESSURE_RATIO` | Fraction of the token budget at which deferred compaction bypasses hot-cache delay so prompt-mutating debt can run before overflow. Set to `1` to disable this bypass. |
+| `cacheAwareCompaction.enabled` | `boolean` | `true` | `LCM_CACHE_AWARE_COMPACTION_ENABLED` | Deprecated. Accepted for config compatibility but no longer used for automatic compaction decisions. |
+| `cacheAwareCompaction.cacheTTLSeconds` | `integer` | `300` | `LCM_CACHE_TTL_SECONDS` | Deprecated. Accepted for config compatibility; threshold debt no longer waits for cache TTL. |
+| `cacheAwareCompaction.maxColdCacheCatchupPasses` | `integer` | `2` | `LCM_MAX_COLD_CACHE_CATCHUP_PASSES` | Deprecated. Automatic cold-cache catch-up passes were removed. |
+| `cacheAwareCompaction.hotCachePressureFactor` | `number` | `4` | `LCM_HOT_CACHE_PRESSURE_FACTOR` | Deprecated. Hot-cache raw-history pressure no longer drives automatic compaction. |
+| `cacheAwareCompaction.hotCacheBudgetHeadroomRatio` | `number` | `0.2` | `LCM_HOT_CACHE_BUDGET_HEADROOM_RATIO` | Deprecated. Hot-cache budget headroom no longer defers automatic threshold compaction. |
+| `cacheAwareCompaction.coldCacheObservationThreshold` | `integer` | `3` | `LCM_COLD_CACHE_OBSERVATION_THRESHOLD` | Deprecated. Cold-cache streaks remain observable telemetry only. |
+| `cacheAwareCompaction.criticalBudgetPressureRatio` | `number` | `0.90` | `LCM_CRITICAL_BUDGET_PRESSURE_RATIO` | Deprecated. `contextThreshold` is the only automatic compaction threshold. |
 #### `dynamicLeafChunkTokens`
 | Key | Type | Default | Env override | Purpose |
 | --- | --- | --- | --- | --- |
-| `dynamicLeafChunkTokens.enabled` | `boolean` | `true` | `LCM_DYNAMIC_LEAF_CHUNK_TOKENS_ENABLED` | Enables dynamic working leaf chunk sizes for busier sessions. |
-| `dynamicLeafChunkTokens.max` | `integer` | `max(leafChunkTokens, floor(leafChunkTokens * 2))` | `LCM_DYNAMIC_LEAF_CHUNK_TOKENS_MAX` | Upper bound for the dynamic working chunk size. With the default `leafChunkTokens=20000`, this resolves to `40000`. |
+| `dynamicLeafChunkTokens.enabled` | `boolean` | `true` | `LCM_DYNAMIC_LEAF_CHUNK_TOKENS_ENABLED` | Deprecated. Accepted for config compatibility but no longer used by automatic compaction. |
+| `dynamicLeafChunkTokens.max` | `integer` | `max(leafChunkTokens, floor(leafChunkTokens * 2))` | `LCM_DYNAMIC_LEAF_CHUNK_TOKENS_MAX` | Deprecated. With the default `leafChunkTokens=20000`, this resolves to `40000`, but automatic compaction uses `leafChunkTokens`. |
+### Threshold full-sweep compaction
-### Cache-aware incremental compaction
+Automatic compaction is threshold-only:
-When cache-aware compaction is enabled:
+- `afterTurn()` evaluates `contextThreshold` against the active token budget
+- below threshold, no automatic compaction runs and no leaf debt is recorded
+- at or above threshold, inline mode runs a threshold full sweep immediately
+- deferred mode records one coalesced `"threshold"` maintenance row and drains it in the background, `maintain()`, or pre-assembly
-- hot cache stretches the incremental leaf trigger to `dynamicLeafChunkTokens.max`
-- hot cache skips incremental maintenance entirely when the assembled context is still comfortably below the real token budget
-- hot cache also gets a short hysteresis window so one ambiguous turn does not immediately discard a recently healthy cache signal
-- cold cache still allows bounded catch-up passes via `cacheAwareCompaction.maxColdCacheCatchupPasses`
-- once `currentTokenCount >= criticalBudgetPressureRatio * tokenBudget`, deferred compaction bypasses hot-cache delay so prompt-mutating debt can run before emergency overflow handling
+Lossless still records prompt-cache telemetry for status and diagnostics, but cache hotness no longer delays threshold debt. Legacy `cacheAwareCompaction.*` and `dynamicLeafChunkTokens.*` settings remain accepted so existing OpenClaw config continues to load, but they do not change automatic compaction behavior.
-When incremental leaf compaction still runs on a hot cache, follow-on condensed passes are suppressed so the maintenance cycle only pays for the leaf pass that was explicitly justified.
+Full sweeps first run leaf passes until there are no more eligible raw-message chunks outside the fresh tail. Condensation is then driven by summarized-prefix pressure: the routine condensation phase obeys `sweepMaxDepth`, and if the summarized prefix still exceeds `summaryPrefixTargetTokens`, a pressure phase may use `condensedMinFanoutHard` and condense deeper. Total context pressure starts the sweep, but does not by itself force deeper condensation once the raw prefix has been summarized.
 ### Prompt-aware eviction
@@ -235,12 +247,12 @@ Compaction summarization resolves candidates in this order:
 1. `LCM_SUMMARY_MODEL` and `LCM_SUMMARY_PROVIDER`
 2. `plugins.entries.lossless-claw.config.summaryModel` and `summaryProvider`
 3. OpenClaw's default compaction model
-4. Legacy per-call provider and model hints
+4. Runtime/session provider and model hints from OpenClaw
 5. `fallbackProviders`
 If `summaryModel` already contains a provider prefix such as `anthropic/claude-sonnet-4-20250514`, `summaryProvider` is ignored for that candidate.
-Runtime-managed OAuth providers are supported here too. In particular, `openai-codex` and `github-copilot` auth profiles can be used for summary and expansion calls without a separate API key.
+Lossless does not resolve provider credentials directly for compaction summaries. OpenClaw's runtime LLM layer owns provider/model preparation, auth profiles, OAuth refresh, base URLs, and dispatch. Lossless only selects the requested summary target and passes it to the host runtime, where model override policy is enforced.
 A practical starting point for cost-sensitive setups is:
@@ -285,11 +297,12 @@ This keeps long-term history available while still giving users a real clean-sla
 Lossless-claw now defaults `proactiveThresholdCompactionMode` to `deferred`.
 - deferred mode records a single coalesced maintenance debt row per conversation
-- deferred mode persists provider/model/cache telemetry so Anthropic-family sessions can avoid rewriting a still-hot prompt cache
-- `maintain()` can still process non-prompt-mutating work when the host explicitly opts in to deferred execution, but it leaves prompt-mutating debt pending while Anthropic cache is still hot
-- `assemble()` consumes deferred prompt-mutating debt pre-assembly once the cache is cold or the next turn is already approaching overflow
+- new deferred compaction debt is only created for `contextThreshold` pressure and uses reason `"threshold"`
+- `maintain()` consumes threshold debt when the host explicitly opts in to deferred execution
+- `assemble()` consumes pending threshold debt before building the next prompt
+- old non-threshold debt from earlier builds is revalidated; if the conversation is no longer over threshold, it is cleared as a no-op
 - `/lcm status` / `/lossless status` shows the current maintenance state, including pending/running/last-failure details
-- status output also surfaces the latest API/cache telemetry so operators can see whether a deferred debt item is being preserved for cache-safety reasons
+- status output also surfaces the latest API/cache telemetry as diagnostics, not as a deferral gate
 - set `proactiveThresholdCompactionMode` to `inline` only if you need the legacy inline proactive compaction behavior for compatibility
 ### `/lcm rotate`