npm - @martian-engineering/lossless-claw - Versions diffs - 0.7.0 → 0.8.1 - Mend

@martian-engineering/lossless-claw 0.7.0 → 0.8.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (54) hide show

package/README.md +19 -3
package/dist/index.js +19240 -0
package/docs/agent-tools.md +9 -4
package/docs/configuration.md +24 -5
package/openclaw.plugin.json +27 -3
package/package.json +7 -6
package/skills/lossless-claw/SKILL.md +3 -2
package/skills/lossless-claw/references/architecture.md +12 -0
package/skills/lossless-claw/references/config.md +37 -0
package/skills/lossless-claw/references/diagnostics.md +13 -0
package/index.ts +0 -2
package/src/assembler.ts +0 -1188
package/src/compaction.ts +0 -1756
package/src/db/config.ts +0 -345
package/src/db/connection.ts +0 -141
package/src/db/features.ts +0 -42
package/src/db/migration.ts +0 -746
package/src/engine.ts +0 -4306
package/src/expansion-auth.ts +0 -365
package/src/expansion-policy.ts +0 -303
package/src/expansion.ts +0 -383
package/src/integrity.ts +0 -600
package/src/large-files.ts +0 -546
package/src/lcm-log.ts +0 -37
package/src/openclaw-bridge.ts +0 -22
package/src/plugin/index.ts +0 -1960
package/src/plugin/lcm-command.ts +0 -765
package/src/plugin/lcm-doctor-apply.ts +0 -542
package/src/plugin/lcm-doctor-shared.ts +0 -210
package/src/plugin/shared-init.ts +0 -59
package/src/prune.ts +0 -391
package/src/retrieval.ts +0 -363
package/src/session-patterns.ts +0 -23
package/src/startup-banner-log.ts +0 -49
package/src/store/compaction-telemetry-store.ts +0 -156
package/src/store/conversation-store.ts +0 -929
package/src/store/fts5-sanitize.ts +0 -50
package/src/store/full-text-fallback.ts +0 -83
package/src/store/full-text-sort.ts +0 -21
package/src/store/index.ts +0 -39
package/src/store/parse-utc-timestamp.ts +0 -25
package/src/store/summary-store.ts +0 -1519
package/src/summarize.ts +0 -1511
package/src/tools/common.ts +0 -53
package/src/tools/lcm-conversation-scope.ts +0 -127
package/src/tools/lcm-describe-tool.ts +0 -245
package/src/tools/lcm-expand-query-tool.ts +0 -831
package/src/tools/lcm-expand-tool.delegation.ts +0 -580
package/src/tools/lcm-expand-tool.ts +0 -453
package/src/tools/lcm-expansion-recursion-guard.ts +0 -373
package/src/tools/lcm-grep-tool.ts +0 -228
package/src/transaction-mutex.ts +0 -136
package/src/transcript-repair.ts +0 -301
package/src/types.ts +0 -165

package/docs/agent-tools.md CHANGED Viewed

@@ -24,7 +24,7 @@ Summaries are lossy by design. The "Expand for details about:" footer at the end
 - Tool call sequences and their outputs
 - Verbatim quotes or specific data points
-`lcm_expand_query` is bounded (~120s, scoped sub-agent) and relatively cheap. Don't ration it.
+`lcm_expand_query` is bounded (~120s, scoped sub-agent) and relatively cheap. Don't ration it, but use `lcm_grep` first when you need broad discovery across many sessions.
 ## Tool reference
@@ -114,6 +114,8 @@ lcm_describe(id: "file_789abc012345")
 Answer a focused question by expanding summaries through the DAG. Spawns a bounded sub-agent that walks parent links down to source material and returns a compact answer.
+When `allConversations: true` is set, `lcm_expand_query` can now synthesize one answer across multiple conversations. That cross-conversation mode is bounded, not exhaustive: it ranks conversation buckets, expands only the top few, and marks the result truncated when lower-ranked buckets are skipped or fail.
 **Parameters:**
 | Param | Type | Required | Default | Description |
@@ -130,9 +132,11 @@ Answer a focused question by expanding summaries through the DAG. Spawns a bound
 **Returns:**
 - `answer` — The focused answer text
 - `citedIds` — Summary IDs that contributed to the answer
+- `sourceConversationIds` — Conversations that were successfully expanded
 - `expandedSummaryCount` — How many summaries were expanded
 - `totalSourceTokens` — Total tokens read from the DAG
 - `truncated` — Whether the answer was truncated to fit maxTokens
+- `conversationBreakdown` — Optional per-conversation success/failure diagnostics for bounded multi-conversation runs
 **Examples:**
@@ -149,7 +153,7 @@ lcm_expand_query(
   prompt: "What were the exact file changes?"
 )
-# Cross-conversation search
+# Cross-conversation synthesis
 lcm_expand_query(
   query: "deployment procedure",
   prompt: "What's the current deployment process?",
@@ -175,7 +179,7 @@ Add instructions to your agent's system prompt so it knows when to use LCM tools
 Use LCM tools for recall:
 1. `lcm_grep` — Search all conversations by keyword/regex. Prefer `mode: "full_text"` for topic recall, quote exact phrases, use `sort: "relevance"` for older-topic lookups, and `sort: "hybrid"` when recency should still matter.
 2. `lcm_describe` — Inspect a specific summary (cheap, no sub-agent)
-3. `lcm_expand_query` — Deep recall with sub-agent expansion
+3. `lcm_expand_query` — Deep recall with bounded sub-agent expansion
 When summaries in context have an "Expand for details about:" footer
 listing something you need, use `lcm_expand_query` to get the full detail.
@@ -183,7 +187,7 @@ listing something you need, use `lcm_expand_query` to get the full detail.
 ### Conversation scoping
-By default, tools operate on the current conversation. Use `allConversations: true` to search across all of them (all agents, all sessions). Use `conversationId` to target a specific conversation you already know about (from previous grep results).
+By default, tools operate on the current conversation. Use `lcm_grep(..., allConversations: true)` when you need broad global discovery. Use `lcm_expand_query(..., allConversations: true)` when you want bounded synthesis across sessions. Use `conversationId` when you already know the exact conversation to inspect or expand.
 ### Performance considerations
@@ -191,3 +195,4 @@ By default, tools operate on the current conversation. Use `allConversations: tr
 - `lcm_expand_query` spawns a sub-agent and takes ~30–120 seconds
 - The sub-agent has a 120-second timeout with cleanup guarantees
 - Token caps (`LCM_MAX_EXPAND_TOKENS`) prevent runaway expansion
+- Cross-conversation `lcm_expand_query` expands only a bounded set of top-ranked conversations

package/docs/configuration.md CHANGED Viewed

@@ -16,11 +16,13 @@ Most installations only need to override a handful of keys. If you want a comple
 {
   "enabled": true,
   "databasePath": "/Users/alice/.openclaw/lcm.db",
+  "largeFilesDir": "/Users/alice/.openclaw/lcm-files",
   "ignoreSessionPatterns": [],
   "statelessSessionPatterns": [],
   "skipStatelessSessions": true,
   "contextThreshold": 0.75,
   "freshTailCount": 64,
+  "freshTailMaxTokens": 24000,
   "newSessionRetainDepth": 2,
   "leafMinFanout": 8,
   "condensedMinFanout": 4,
@@ -42,6 +44,7 @@ Most installations only need to override a handful of keys. If you want a comple
   "summaryTimeoutMs": 60000,
   "timezone": "America/Los_Angeles",
   "pruneHeartbeatOk": false,
+  "transcriptGcEnabled": false,
   "maxAssemblyTokenBudget": 30000,
   "summaryMaxOverageFactor": 3,
   "customInstructions": "",
@@ -65,6 +68,7 @@ Notes on the example:
 - Values shown are the runtime defaults when a fixed default exists.
 - `databasePath` shows the expanded default path shape. Use an absolute path in config rather than `~`.
+- `largeFilesDir` shows the expanded default path shape. Both `databasePath` and `largeFilesDir` default to paths under `OPENCLAW_STATE_DIR` (which in turn falls back to `~/.openclaw`).
 - `timezone` has no fixed hardcoded default; at runtime it resolves from `TZ` first, then the system timezone. The example uses `America/Los_Angeles`.
 - `maxAssemblyTokenBudget` has no default. The example uses `30000` as a realistic cap for a 32k-class model.
 - `databasePath` is the preferred key. `dbPath` is an accepted alias.
@@ -97,14 +101,18 @@ openclaw plugins install --link /path/to/lossless-claw
 | Key | Type | Default | Env override | Purpose |
 | --- | --- | --- | --- | --- |
 | `enabled` | `boolean` | `true` | `LCM_ENABLED` | Enables or disables lossless-claw without uninstalling it. |
-| `databasePath` | `string` | `${HOME}/.openclaw/lcm.db` | `LCM_DATABASE_PATH` | Preferred path for the SQLite database. |
+| `databasePath` | `string` | `${OPENCLAW_STATE_DIR}/lcm.db` | `LCM_DATABASE_PATH` | Preferred path for the SQLite database. |
 | `dbPath` | `string` | alias of `databasePath` | `LCM_DATABASE_PATH` | Legacy alias for `databasePath`. Prefer `databasePath` in new config. |
+| `largeFilesDir` | `string` | `${OPENCLAW_STATE_DIR}/lcm-files` | `LCM_LARGE_FILES_DIR` | Directory where large-file text payloads are persisted. Automatically follows the active state directory. |
 | `ignoreSessionPatterns` | `string[]` | `[]` | `LCM_IGNORE_SESSION_PATTERNS` | Session-key glob patterns that skip LCM entirely. |
 | `statelessSessionPatterns` | `string[]` | `[]` | `LCM_STATELESS_SESSION_PATTERNS` | Session-key glob patterns that may read from LCM but never write to it. |
 | `skipStatelessSessions` | `boolean` | `true` | `LCM_SKIP_STATELESS_SESSIONS` | Enforces `statelessSessionPatterns` when enabled. |
 | `newSessionRetainDepth` | `integer` | `2` | `LCM_NEW_SESSION_RETAIN_DEPTH` | Controls what survives `/new`. `-1` keeps all context, `0` keeps summaries only, higher values keep only deeper summaries. |
 | `timezone` | `string` | `TZ` or system timezone | `TZ` | IANA timezone used for timestamp rendering in summaries. |
 | `pruneHeartbeatOk` | `boolean` | `false` | `LCM_PRUNE_HEARTBEAT_OK` | Retroactively removes `HEARTBEAT_OK` turn cycles from persisted storage. |
+| `transcriptGcEnabled` | `boolean` | `false` | `LCM_TRANSCRIPT_GC_ENABLED` | Enables transcript rewrite GC during `maintain()`; disabled by default so transcript rewrites stay opt-in. |
+> **Multi-profile note:** `OPENCLAW_STATE_DIR` (set by the host OpenClaw gateway) controls where state is stored. When two gateways run on the same host (e.g. separate bot personas), each gateway sets its own `OPENCLAW_STATE_DIR` and lossless-claw automatically uses that directory for the database, large-file payloads, auth-profile lookups, and legacy secrets — no per-profile plugin config is needed.
 ### Compaction thresholds and summary sizing
@@ -112,6 +120,7 @@ openclaw plugins install --link /path/to/lossless-claw
 | --- | --- | --- | --- | --- |
 | `contextThreshold` | `number` | `0.75` | `LCM_CONTEXT_THRESHOLD` | Fraction of the active model context window that triggers compaction. |
 | `freshTailCount` | `integer` | `64` | `LCM_FRESH_TAIL_COUNT` | Number of newest messages always kept raw. |
+| `freshTailMaxTokens` | `integer` | unset | `LCM_FRESH_TAIL_MAX_TOKENS` | Optional token cap for the protected fresh tail. The newest message is always preserved even if it exceeds the cap. |
 | `leafMinFanout` | `integer` | `8` | `LCM_LEAF_MIN_FANOUT` | Minimum number of raw messages required before a leaf pass runs. |
 | `condensedMinFanout` | `integer` | `4` | `LCM_CONDENSED_MIN_FANOUT` | Number of same-depth summaries needed before condensation is attempted. |
 | `condensedMinFanoutHard` | `integer` | `2` | `LCM_CONDENSED_MIN_FANOUT_HARD` | Hard floor for condensation grouping during maintenance and repair flows. |
@@ -191,6 +200,15 @@ Compaction summarization resolves candidates in this order:
 If `summaryModel` already contains a provider prefix such as `anthropic/claude-sonnet-4-20250514`, `summaryProvider` is ignored for that candidate.
+Runtime-managed OAuth providers are supported here too. In particular, `openai-codex` and `github-copilot` auth profiles can be used for summary and expansion calls without a separate API key.
+A practical starting point for cost-sensitive setups is:
+```env
+LCM_SUMMARY_MODEL=openai/gpt-5.4-mini
+LCM_EXPANSION_MODEL=openai/gpt-5.4-mini
+```
 ### Session pattern matching
 `ignoreSessionPatterns` and `statelessSessionPatterns` use full session keys.
@@ -228,16 +246,17 @@ These settings are not part of `plugins.entries.lossless-claw.config`, but they
 | Env var | Default | Purpose |
 | --- | --- | --- |
+| `OPENCLAW_STATE_DIR` | `~/.openclaw` | Active state directory for the OpenClaw gateway. When set, all path defaults (database, large files, auth profiles, secrets) resolve relative to this directory instead of `~/.openclaw`. Set automatically by OpenClaw for non-default profiles. |
 | `LCM_TUI_CONVERSATION_WINDOW_SIZE` | `200` | Number of messages `lcm-tui` loads per keyset-paged conversation window. |
 ## Database operations
-The SQLite database lives at `databasePath` or `LCM_DATABASE_PATH`. The default path is `${HOME}/.openclaw/lcm.db`.
+The SQLite database lives at `databasePath` or `LCM_DATABASE_PATH`. The default path is `${OPENCLAW_STATE_DIR}/lcm.db` (resolves to `~/.openclaw/lcm.db` when `OPENCLAW_STATE_DIR` is not set).
 Inspect it with:
 ```bash
-sqlite3 ~/.openclaw/lcm.db
+sqlite3 "${OPENCLAW_STATE_DIR:-$HOME/.openclaw}/lcm.db"
 SELECT COUNT(*) FROM conversations;
 SELECT * FROM context_items WHERE conversation_id = 1 ORDER BY ordinal;
@@ -248,8 +267,8 @@ SELECT summary_id, depth, token_count FROM summaries ORDER BY token_count DESC L
 Back it up with:
 ```bash
-cp ~/.openclaw/lcm.db ~/.openclaw/lcm.db.backup
-sqlite3 ~/.openclaw/lcm.db ".backup ~/.openclaw/lcm.db.backup"
+cp "${OPENCLAW_STATE_DIR:-$HOME/.openclaw}/lcm.db" "${OPENCLAW_STATE_DIR:-$HOME/.openclaw}/lcm.db.backup"
+sqlite3 "${OPENCLAW_STATE_DIR:-$HOME/.openclaw}/lcm.db" ".backup ${OPENCLAW_STATE_DIR:-$HOME/.openclaw}/lcm.db.backup"
 ```
 ## Disabling lossless-claw

package/openclaw.plugin.json CHANGED Viewed

@@ -1,5 +1,6 @@
 {
   "id": "lossless-claw",
+  "kind": "context-engine",
   "skills": [
     "skills/lossless-claw"
   ],
@@ -20,6 +21,10 @@
       "label": "Fresh Tail Count",
       "help": "Number of recent messages protected from compaction"
     },
+    "freshTailMaxTokens": {
+      "label": "Fresh Tail Max Tokens",
+      "help": "Optional token cap for the protected fresh tail; the newest message is always preserved"
+    },
     "leafChunkTokens": {
       "label": "Leaf Chunk Tokens",
       "help": "Maximum source tokens per leaf compaction chunk before summarization"
@@ -58,11 +63,15 @@
     },
     "dbPath": {
       "label": "Database Path",
-      "help": "Path to LCM SQLite database (default: ~/.openclaw/lcm.db)"
+      "help": "Path to LCM SQLite database (default: <OPENCLAW_STATE_DIR>/lcm.db; falls back to ~/.openclaw/lcm.db)"
     },
     "databasePath": {
       "label": "Database Path",
-      "help": "Path to LCM SQLite database (preferred key; alias of dbPath)"
+      "help": "Path to LCM SQLite database (preferred key; alias of dbPath, default: <OPENCLAW_STATE_DIR>/lcm.db)"
+    },
+    "largeFilesDir": {
+      "label": "Large Files Directory",
+      "help": "Directory for persisting large-file text payloads (default: <stateDir>/lcm-files). Uses OPENCLAW_STATE_DIR when set."
     },
     "ignoreSessionPatterns": {
       "label": "Ignored Sessions",
@@ -168,6 +177,10 @@
       "label": "Prune HEARTBEAT_OK",
       "help": "Retroactively delete HEARTBEAT_OK turn cycles from LCM storage"
     },
+    "transcriptGcEnabled": {
+      "label": "Transcript GC",
+      "help": "Enable transcript rewrite GC during maintain(); disabled by default"
+    },
     "fallbackProviders": {
       "label": "Fallback Providers",
       "help": "Explicit fallback provider/model pairs for compaction summarization (e.g., [{\"provider\": \"anthropic\", \"model\": \"claude-haiku-4-5\"}])"
@@ -193,6 +206,10 @@
         "type": "integer",
         "minimum": 1
       },
+      "freshTailMaxTokens": {
+        "type": "integer",
+        "minimum": 0
+      },
       "leafChunkTokens": {
         "type": "integer",
         "minimum": 1
@@ -341,8 +358,15 @@
       "pruneHeartbeatOk": {
         "type": "boolean"
       },
+      "transcriptGcEnabled": {
+        "type": "boolean"
+      },
       "databasePath": {
-        "description": "Path to LCM SQLite database (alias for dbPath)",
+        "description": "Path to LCM SQLite database (preferred key; alias of dbPath, default: <OPENCLAW_STATE_DIR>/lcm.db)",
+        "type": "string"
+      },
+      "largeFilesDir": {
+        "description": "Directory for persisting large-file text payloads (default: <OPENCLAW_STATE_DIR>/lcm-files)",
         "type": "string"
       },
       "fallbackProviders": {

package/package.json CHANGED Viewed

@@ -1,9 +1,9 @@
 {
   "name": "@martian-engineering/lossless-claw",
-  "version": "0.7.0",
+  "version": "0.8.1",
   "description": "Lossless Context Management plugin for OpenClaw — DAG-based conversation summarization with incremental compaction",
   "type": "module",
-  "main": "index.ts",
+  "main": "dist/index.js",
   "license": "MIT",
   "author": "Josh Lehman <josh@martian.engineering>",
   "keywords": [
@@ -16,14 +16,14 @@
     "dag"
   ],
   "scripts": {
+    "build": "esbuild index.ts --bundle --platform=node --target=node22 --format=esm --outfile=dist/index.js --external:openclaw --external:\"@mariozechner/*\"",
     "changeset": "changeset",
-    "release:verify": "npm test && npm pack --dry-run",
+    "release:verify": "npm run build && npm test && npm pack --dry-run",
     "test": "vitest run --dir test",
     "version-packages": "changeset version"
   },
   "files": [
-    "index.ts",
-    "src/**/*.ts",
+    "dist/",
     "skills/",
     "openclaw.plugin.json",
     "docs/",
@@ -38,6 +38,7 @@
   "devDependencies": {
     "@changesets/changelog-github": "^0.6.0",
     "@changesets/cli": "^2.30.0",
+    "esbuild": "^0.28.0",
     "typescript": "^5.7.0",
     "vitest": "^3.0.0"
   },
@@ -49,7 +50,7 @@
   },
   "openclaw": {
     "extensions": [
-      "./index.ts"
+      "./dist/index.js"
     ]
   },
   "repository": {

package/skills/lossless-claw/SKILL.md CHANGED Viewed

@@ -12,8 +12,9 @@ Start here:
 1. Confirm whether the user needs configuration help, diagnostics, recall-tool guidance, or session-lifecycle guidance.
 2. If they need a quick health check, tell them to run `/lossless` (`/lcm` is the shorter alias).
 3. If they suspect summary corruption or truncation, use `/lossless doctor`.
-4. If they ask how `/new` or `/reset` interacts with LCM, read the session-lifecycle reference before answering.
-5. Load the relevant reference file instead of improvising details from memory.
+4. If they want high-confidence junk/session cleanup guidance, use `/lossless doctor clean` before recommending any deletes.
+5. If they ask how `/new` or `/reset` interacts with LCM, read the session-lifecycle reference before answering.
+6. Load the relevant reference file instead of improvising details from memory.
 Reference map:

package/skills/lossless-claw/references/architecture.md CHANGED Viewed

@@ -50,3 +50,15 @@ It looks for known summary-health markers that indicate:
 - truncated summary artifacts near the end of stored content
 This gives users one place to answer the question “is my summary graph healthy?” without introducing a broader mutation surface.
+## What `/lcm doctor clean` tells you
+The cleaners flow is also diagnostic first.
+It reports high-confidence junk patterns that are structurally safe to review as standalone cleanup candidates, including:
+- archived subagent sessions
+- cron sessions
+- NULL-key orphaned subagent context conversations
+This keeps cleanup discovery separate from summary-health diagnostics while still using the same native command surface.

package/skills/lossless-claw/references/config.md CHANGED Viewed

@@ -43,6 +43,20 @@ Good starting range:
 - `32` to `64`
+### `freshTailMaxTokens`
+Optional token cap for the protected fresh tail.
+Why it matters:
+- Prevents a few huge tool results from making the "fresh" suffix effectively uncompactable.
+- Still preserves the newest message even if that single message exceeds the cap.
+Good starting range:
+- Leave unset unless large tool outputs are forcing avoidable cost or overflow.
+- Start around `12000` to `32000` when you want a softer, size-aware fresh tail.
 ### `leafChunkTokens`
 Caps how much raw material gets summarized into one leaf summary.
@@ -154,6 +168,7 @@ Why it matters:
 - useful for custom deployments, testing, or isolating environments
 - wrong path selection is a common reason operators think LCM is empty or not growing
+- the default resolves to `${OPENCLAW_STATE_DIR}/lcm.db` (falls back to `~/.openclaw/lcm.db`)
 ### `databasePath`
@@ -164,6 +179,15 @@ Why it matters:
 - this is the documented key new config should use
 - `dbPath` is still accepted for compatibility
+### `largeFilesDir`
+Directory for persisting large-file text payloads externalised from the transcript.
+Why it matters:
+- defaults to `${OPENCLAW_STATE_DIR}/lcm-files`; on multi-profile hosts each profile stores files in its own state directory automatically
+- override with `LCM_LARGE_FILES_DIR` or set `largeFilesDir` in plugin config when you want an explicit path
 ### `largeFileThresholdTokens`
 Threshold for externalizing oversized tool/file payloads out of the main transcript into large-file storage.
@@ -173,6 +197,15 @@ Why it matters:
 - lower values externalize more aggressively
 - higher values keep more payload inline but can bloat storage and compaction inputs
+### `transcriptGcEnabled`
+Controls whether `maintain()` rewrites transcript entries for already-externalized tool results.
+Why it matters:
+- keep this off unless you want transcript GC to mutate the live session file during maintenance
+- the default is `false`
 ## Compaction timing and shape
 ### `contextThreshold`
@@ -183,6 +216,10 @@ See high-impact settings above.
 See high-impact settings above.
+### `freshTailMaxTokens`
+See high-impact settings above.
 ### `leafChunkTokens`
 See high-impact settings above.

package/skills/lossless-claw/references/diagnostics.md CHANGED Viewed

@@ -29,6 +29,19 @@ What it should help confirm:
 - whether truncation markers exist
 - which conversations are affected most
+### `/lossless doctor clean`
+Use this when the user wants read-only diagnostics for high-confidence junk patterns before any cleanup.
+It should help confirm:
+- whether archived subagent sessions are present
+- whether cron sessions are accumulating unexpectedly
+- whether NULL-key orphaned subagent conversations are present
+- which high-confidence filters match the most conversations and messages
+This command is read-only. Use it to identify likely cleanup candidates before taking any separate cleanup action.
 ## Interpreting common states
 ### `/lossless` tokens vs `/status` context

package/index.ts DELETED Viewed

	@@ -1,2 +0,0 @@
1	- export { default } from "./src/plugin/index.js";
2	- export { buildCompleteSimpleOptions, shouldOmitTemperatureForApi } from "./src/plugin/index.js";