@martian-engineering/lossless-claw 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,222 @@
1
+ # Architecture
2
+
3
+ This document describes how lossless-claw works internally — the data model, compaction lifecycle, context assembly, and expansion system.
4
+
5
+ ## Data model
6
+
7
+ ### Conversations and messages
8
+
9
+ Every OpenClaw session maps to a **conversation**. The first time a session ingests a message, LCM creates a conversation record keyed by the runtime session ID.
10
+
11
+ Messages are stored with:
12
+ - **seq** — Monotonically increasing sequence number within the conversation
13
+ - **role** — `user`, `assistant`, `system`, or `tool`
14
+ - **content** — Plain text extraction of the message
15
+ - **tokenCount** — Estimated token count (~4 chars/token)
16
+ - **createdAt** — Insertion timestamp
17
+
18
+ Each message also has **message_parts** — structured content blocks that preserve the original shape (text blocks, tool calls, tool results, reasoning, file content, etc.). This allows the assembler to reconstruct rich content when building model context, not just flat text.
19
+
20
+ ### The summary DAG
21
+
22
+ Summaries form a directed acyclic graph with two node types:
23
+
24
+ **Leaf summaries** (depth 0, kind `"leaf"`):
25
+ - Created from a chunk of raw messages
26
+ - Linked to source messages via `summary_messages`
27
+ - Contain a narrative summary with timestamps
28
+ - Typically 800–1200 tokens
29
+
30
+ **Condensed summaries** (depth 1+, kind `"condensed"`):
31
+ - Created from a chunk of summaries at the same depth
32
+ - Linked to parent summaries via `summary_parents`
33
+ - Each depth tier uses a progressively more abstract prompt
34
+ - Typically 1500–2000 tokens
35
+
36
+ Every summary carries:
37
+ - **summaryId** — `sum_` + 16 hex chars (SHA-256 of content + timestamp)
38
+ - **conversationId** — Which conversation it belongs to
39
+ - **depth** — Position in the hierarchy (0 = leaf)
40
+ - **earliestAt / latestAt** — Time range of source material
41
+ - **descendantCount** — Total number of ancestor summaries (transitive)
42
+ - **fileIds** — References to large files mentioned in the source
43
+ - **tokenCount** — Estimated tokens
44
+
45
+ ### Context items
46
+
47
+ The **context_items** table maintains the ordered list of what the model sees for each conversation. Each entry is either a message reference or a summary reference, identified by ordinal.
48
+
49
+ When compaction creates a summary from a range of messages (or summaries), the source items are replaced by a single summary item. This keeps the context list compact while preserving ordering.
50
+
51
+ ## Compaction lifecycle
52
+
53
+ ### Ingestion
54
+
55
+ When OpenClaw processes a turn, it calls the context engine's lifecycle hooks:
56
+
57
+ 1. **bootstrap** — On session start, reconciles the JSONL session file with the LCM database. Imports any messages that exist in the file but not in LCM (crash recovery).
58
+ 2. **ingest** / **ingestBatch** — Persists new messages to the database and appends them to context_items.
59
+ 3. **afterTurn** — After the model responds, ingests new messages, then evaluates whether compaction should run.
60
+
61
+ ### Leaf compaction
62
+
63
+ The **leaf pass** converts raw messages into leaf summaries:
64
+
65
+ 1. Identify the oldest contiguous chunk of raw messages outside the **fresh tail** (protected recent messages).
66
+ 2. Cap the chunk at `leafChunkTokens` (default 20k tokens).
67
+ 3. Concatenate message content with timestamps.
68
+ 4. Resolve the most recent prior summary for continuity (passed as `previous_context` so the LLM avoids repeating known information).
69
+ 5. Send to the LLM with the leaf prompt.
70
+ 6. If the summary is larger than the input (LLM failure), retry with the aggressive prompt. If still too large, fall back to deterministic truncation.
71
+ 7. Persist the summary, link to source messages, and replace the message range in context_items.
72
+
73
+ ### Condensation
74
+
75
+ The **condensed pass** merges summaries at the same depth into a higher-level summary:
76
+
77
+ 1. Find the shallowest depth with enough contiguous same-depth summaries (≥ `leafMinFanout` for d0, ≥ `condensedMinFanout` for d1+).
78
+ 2. Concatenate their content with time range headers.
79
+ 3. Send to the LLM with the depth-appropriate prompt (d1, d2, or d3+).
80
+ 4. Apply the same escalation strategy (normal → aggressive → truncation fallback).
81
+ 5. Persist with depth = targetDepth + 1, link to parent summaries, replace the range in context_items.
82
+
83
+ ### Compaction modes
84
+
85
+ **Incremental (after each turn):**
86
+ - Checks if raw tokens outside the fresh tail exceed `leafChunkTokens`
87
+ - If so, runs one leaf pass
88
+ - If `incrementalMaxDepth > 0`, follows with condensation passes up to that depth
89
+ - Best-effort: failures don't break the conversation
90
+
91
+ **Full sweep (manual `/compact` or overflow):**
92
+ - Phase 1: Repeatedly runs leaf passes until no more eligible chunks
93
+ - Phase 2: Repeatedly runs condensation passes starting from the shallowest eligible depth
94
+ - Each pass checks for progress; stops if no tokens were saved
95
+
96
+ **Budget-targeted (`compactUntilUnder`):**
97
+ - Runs up to `maxRounds` (default 10) of full sweeps
98
+ - Stops when context is under the target token count
99
+ - Used by the overflow recovery path
100
+
101
+ ### Three-level escalation
102
+
103
+ Every summarization attempt follows this escalation:
104
+
105
+ 1. **Normal** — Standard prompt, temperature 0.2
106
+ 2. **Aggressive** — Tighter prompt requesting only durable facts, temperature 0.1, lower target tokens
107
+ 3. **Fallback** — Deterministic truncation to ~512 tokens with `[Truncated for context management]` marker
108
+
109
+ This ensures compaction always makes progress, even if the LLM produces poor output.
110
+
111
+ ## Context assembly
112
+
113
+ The assembler runs before each model turn and builds the message array:
114
+
115
+ ```
116
+ [summary₁, summary₂, ..., summaryₙ, message₁, message₂, ..., messageₘ]
117
+ ├── budget-constrained ──┤ ├──── fresh tail (always included) ────┤
118
+ ```
119
+
120
+ ### Steps
121
+
122
+ 1. Fetch all context_items ordered by ordinal.
123
+ 2. Resolve each item — summaries become user messages with XML wrappers; messages are reconstructed from parts.
124
+ 3. Split into evictable prefix and protected fresh tail (last `freshTailCount` raw messages).
125
+ 4. Compute fresh tail token cost (always included, even if over budget).
126
+ 5. Fill remaining budget from the evictable set, keeping newest items and dropping oldest.
127
+ 6. Normalize assistant content to array blocks (Anthropic API compatibility).
128
+ 7. Sanitize tool-use/result pairing (ensures every tool_result has a matching tool_use).
129
+
130
+ ### XML summary format
131
+
132
+ Summaries are presented to the model as user messages wrapped in XML:
133
+
134
+ ```xml
135
+ <summary id="sum_abc123" kind="leaf" depth="0" descendant_count="0"
136
+ earliest_at="2026-02-17T07:37:00" latest_at="2026-02-17T08:23:00">
137
+ <content>
138
+ ...summary text with timestamps...
139
+
140
+ Expand for details about: exact error messages, full config diff, intermediate debugging steps
141
+ </content>
142
+ </summary>
143
+ ```
144
+
145
+ Condensed summaries also include parent references:
146
+
147
+ ```xml
148
+ <summary id="sum_def456" kind="condensed" depth="1" descendant_count="8" ...>
149
+ <parents>
150
+ <summary_ref id="sum_aaa111" />
151
+ <summary_ref id="sum_bbb222" />
152
+ </parents>
153
+ <content>...</content>
154
+ </summary>
155
+ ```
156
+
157
+ The XML attributes give the model enough metadata to reason about summary age, scope, and how to drill deeper. The `<parents>` section enables targeted expansion of specific source summaries.
158
+
159
+ ## Expansion system
160
+
161
+ When summaries are too compressed for a task, agents use `lcm_expand_query` to recover detail.
162
+
163
+ ### How it works
164
+
165
+ 1. Agent calls `lcm_expand_query` with a `prompt` and either `summaryIds` or a `query`.
166
+ 2. If `query` is provided, `lcm_grep` finds matching summaries first.
167
+ 3. A **delegation grant** is created, scoping the sub-agent to the relevant conversation(s) with a token cap.
168
+ 4. A sub-agent session is spawned with the expansion task.
169
+ 5. The sub-agent walks the DAG: it can read summary content, follow parent links, access source messages, and inspect stored files.
170
+ 6. The sub-agent returns a focused answer (default ≤ 2000 tokens) with cited summary IDs.
171
+ 7. The grant is revoked and the sub-agent session is cleaned up.
172
+
173
+ ### Security model
174
+
175
+ Expansion uses a delegation grant system:
176
+
177
+ - **Grants** are created at spawn time, scoped to specific conversation IDs
178
+ - **Token caps** limit how much content the sub-agent can access
179
+ - **TTL** ensures grants expire even if cleanup fails
180
+ - **Revocation** happens on completion, cancellation, or sweep
181
+
182
+ The sub-agent only gets `lcm_expand` (the low-level tool), not `lcm_expand_query` — preventing recursive sub-agent spawning.
183
+
184
+ ## Large file handling
185
+
186
+ Files embedded in user messages (typically via `<file>` blocks from tool output) are checked at ingestion:
187
+
188
+ 1. Parse file blocks from message content.
189
+ 2. For each block exceeding `largeFileTokenThreshold` (default 25k tokens):
190
+ - Generate a unique file ID (`file_` prefix)
191
+ - Store the content to `~/.openclaw/lcm-files/<conversation_id>/<file_id>.<ext>`
192
+ - Generate a ~200 token exploration summary (structural analysis, key sections, etc.)
193
+ - Insert a `large_files` record with metadata
194
+ - Replace the file block in the message with a compact reference
195
+ 3. The `lcm_describe` tool can retrieve full file content by ID.
196
+
197
+ This prevents a single large file paste from consuming the entire context window while keeping the content accessible.
198
+
199
+ ## Session reconciliation
200
+
201
+ LCM handles crash recovery through **bootstrap reconciliation**:
202
+
203
+ 1. On session start, read the JSONL session file (OpenClaw's ground truth).
204
+ 2. Compare against the LCM database.
205
+ 3. Find the most recent message that exists in both (the "anchor").
206
+ 4. Import any messages after the anchor that are in JSONL but not in LCM.
207
+
208
+ This handles the case where OpenClaw wrote messages to the session file but crashed before LCM could persist them.
209
+
210
+ ## Operation serialization
211
+
212
+ All mutating operations (ingest, compact) are serialized per-session using a promise queue. This prevents races between concurrent afterTurn/compact calls for the same conversation without blocking operations on different conversations.
213
+
214
+ ## Authentication
215
+
216
+ LCM needs to call an LLM for summarization. It resolves credentials through a three-tier cascade:
217
+
218
+ 1. **Explicit API key** — If provided in legacy params
219
+ 2. **Environment variables** — Standard provider env vars (`ANTHROPIC_API_KEY`, etc.)
220
+ 3. **Auth profiles** — OpenClaw's OAuth/token/API-key profile system (`auth-profiles.json`)
221
+
222
+ For OAuth providers (e.g., Anthropic via Claude Max), LCM handles token refresh and credential persistence automatically.
@@ -0,0 +1,152 @@
1
+ # Configuration guide
2
+
3
+ ## Quick start
4
+
5
+ Add LCM to your OpenClaw config:
6
+
7
+ ```json
8
+ {
9
+ "plugins": {
10
+ "paths": ["/path/to/lossless-claw"],
11
+ "slots": {
12
+ "contextEngine": "lossless-claw"
13
+ }
14
+ }
15
+ }
16
+ ```
17
+
18
+ Set recommended environment variables:
19
+
20
+ ```bash
21
+ export LCM_FRESH_TAIL_COUNT=32
22
+ export LCM_INCREMENTAL_MAX_DEPTH=1
23
+ ```
24
+
25
+ Restart OpenClaw.
26
+
27
+ ## Tuning guide
28
+
29
+ ### Context threshold
30
+
31
+ `LCM_CONTEXT_THRESHOLD` (default `0.75`) controls when compaction triggers as a fraction of the model's context window.
32
+
33
+ - **Lower values** (e.g., 0.5) trigger compaction earlier, keeping context smaller but doing more LLM calls for summarization.
34
+ - **Higher values** (e.g., 0.85) let conversations grow longer before compacting, reducing summarization cost but risking overflow with large model responses.
35
+
36
+ For most use cases, 0.75 is a good balance.
37
+
38
+ ### Fresh tail count
39
+
40
+ `LCM_FRESH_TAIL_COUNT` (default `32`) is the number of most recent messages that are never compacted. These raw messages give the model immediate conversational continuity.
41
+
42
+ - **Smaller values** (e.g., 8–16) save context space for summaries but may lose recent nuance.
43
+ - **Larger values** (e.g., 32–64) give better continuity at the cost of a larger mandatory context floor.
44
+
45
+ For coding conversations with tool calls (which generate many messages per logical turn), 32 is recommended.
46
+
47
+ ### Leaf fanout
48
+
49
+ `LCM_LEAF_MIN_FANOUT` (default `8`) is the minimum number of raw messages that must be available outside the fresh tail before a leaf pass runs.
50
+
51
+ - Lower values create summaries more frequently (more, smaller summaries).
52
+ - Higher values create larger, more comprehensive summaries less often.
53
+
54
+ ### Condensed fanout
55
+
56
+ `LCM_CONDENSED_MIN_FANOUT` (default `4`) controls how many same-depth summaries accumulate before they're condensed into a higher-level summary.
57
+
58
+ - Lower values create deeper DAGs with more levels of abstraction.
59
+ - Higher values keep the DAG shallower but with more nodes at each level.
60
+
61
+ ### Incremental max depth
62
+
63
+ `LCM_INCREMENTAL_MAX_DEPTH` (default `0`) controls whether condensation happens automatically after leaf passes.
64
+
65
+ - **0** — Only leaf summaries are created incrementally. Condensation only happens during manual `/compact` or overflow.
66
+ - **1** — After each leaf pass, attempt to condense d0 summaries into d1. Good default for active conversations.
67
+ - **2+** — Deeper automatic condensation. Rarely needed; the full sweep handles this during overflow.
68
+
69
+ ### Summary target tokens
70
+
71
+ `LCM_LEAF_TARGET_TOKENS` (default `1200`) and `LCM_CONDENSED_TARGET_TOKENS` (default `2000`) control the target size of generated summaries.
72
+
73
+ - Larger targets preserve more detail but consume more context space.
74
+ - Smaller targets are more aggressive, losing detail faster.
75
+
76
+ The actual summary size depends on the LLM's output; these values are guidelines passed in the prompt's token target instruction.
77
+
78
+ ### Leaf chunk tokens
79
+
80
+ `LCM_LEAF_CHUNK_TOKENS` (default `20000`) caps the amount of source material per leaf compaction pass.
81
+
82
+ - Larger chunks create more comprehensive summaries from more material.
83
+ - Smaller chunks create summaries more frequently from less material.
84
+ - This also affects the condensed minimum input threshold (10% of this value).
85
+
86
+ ## Model selection
87
+
88
+ LCM uses the same model as the parent OpenClaw session for summarization by default. You can override this:
89
+
90
+ ```bash
91
+ # Use a specific model for summarization
92
+ export LCM_SUMMARY_MODEL=anthropic/claude-sonnet-4-20250514
93
+ export LCM_SUMMARY_PROVIDER=anthropic
94
+ ```
95
+
96
+ Using a cheaper/faster model for summarization can reduce costs, but quality matters — poor summaries compound as they're condensed into higher-level nodes.
97
+
98
+ ## Database management
99
+
100
+ The SQLite database lives at `LCM_DATABASE_PATH` (default `~/.openclaw/lcm.db`).
101
+
102
+ ### Inspecting the database
103
+
104
+ ```bash
105
+ sqlite3 ~/.openclaw/lcm.db
106
+
107
+ # Count conversations
108
+ SELECT COUNT(*) FROM conversations;
109
+
110
+ # See context items for a conversation
111
+ SELECT * FROM context_items WHERE conversation_id = 1 ORDER BY ordinal;
112
+
113
+ # Check summary depth distribution
114
+ SELECT depth, COUNT(*) FROM summaries GROUP BY depth;
115
+
116
+ # Find large summaries
117
+ SELECT summary_id, depth, token_count FROM summaries ORDER BY token_count DESC LIMIT 10;
118
+ ```
119
+
120
+ ### Backup
121
+
122
+ The database is a single file. Back it up with:
123
+
124
+ ```bash
125
+ cp ~/.openclaw/lcm.db ~/.openclaw/lcm.db.backup
126
+ ```
127
+
128
+ Or use SQLite's online backup:
129
+
130
+ ```bash
131
+ sqlite3 ~/.openclaw/lcm.db ".backup ~/.openclaw/lcm.db.backup"
132
+ ```
133
+
134
+ ## Per-agent configuration
135
+
136
+ In multi-agent OpenClaw setups, each agent uses the same LCM database but has its own conversations (keyed by session ID). The plugin config applies globally; per-agent overrides use environment variables set in the agent's config.
137
+
138
+ ## Disabling LCM
139
+
140
+ To fall back to OpenClaw's built-in compaction:
141
+
142
+ ```json
143
+ {
144
+ "plugins": {
145
+ "slots": {
146
+ "contextEngine": "legacy"
147
+ }
148
+ }
149
+ }
150
+ ```
151
+
152
+ Or set `LCM_ENABLED=false` to disable the plugin while keeping it registered.