npm - @martian-engineering/lossless-claw - Versions diffs - 0.5.2 → 0.6.0 - Mend

@martian-engineering/lossless-claw 0.5.2 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (26) hide show

package/README.md +49 -11
package/docs/configuration.md +44 -0
package/openclaw.plugin.json +114 -0
package/package.json +2 -1
package/skills/lossless-claw/SKILL.md +33 -0
package/skills/lossless-claw/references/architecture.md +52 -0
package/skills/lossless-claw/references/config.md +263 -0
package/skills/lossless-claw/references/diagnostics.md +79 -0
package/skills/lossless-claw/references/recall-tools.md +55 -0
package/skills/lossless-claw/references/session-lifecycle.md +59 -0
package/src/assembler.ts +321 -34
package/src/compaction.ts +220 -19
package/src/db/config.ts +74 -21
package/src/db/migration.ts +50 -13
package/src/engine.ts +742 -133
package/src/plugin/index.ts +156 -73
package/src/plugin/lcm-command.ts +759 -0
package/src/plugin/lcm-doctor-apply.ts +546 -0
package/src/plugin/lcm-doctor-shared.ts +210 -0
package/src/store/conversation-store.ts +60 -21
package/src/store/parse-utc-timestamp.ts +25 -0
package/src/store/summary-store.ts +460 -11
package/src/summarize.ts +553 -224
package/src/tools/lcm-expand-query-tool.ts +195 -59
package/src/tools/lcm-expansion-recursion-guard.ts +87 -0
package/src/types.ts +1 -0

package/skills/lossless-claw/references/config.md ADDED Viewed

@@ -0,0 +1,263 @@
+# Configuration
+This reference covers the current `lossless-claw` config surface on `main`, based on `openclaw.plugin.json`.
+`lossless-claw` is most effective when the operator understands which settings change compaction behavior and why.
+## First checks
+- Ensure the plugin is installed and enabled.
+- Ensure the context-engine slot points at `lossless-claw` when you want it to own compaction.
+- Run `/lossless` (`/lcm` alias) to confirm the plugin is active and see the live DB path.
+## High-impact settings
+These are the settings most operators should understand first.
+### `contextThreshold`
+Controls how full the model context can get before LCM compacts older material.
+- Lower values compact earlier.
+- Higher values compact later.
+Why it matters:
+- Too low increases summarization cost and churn.
+- Too high risks hitting the model window with large tool output or long replies.
+Good default:
+- `0.75`
+### `freshTailCount`
+Keeps the newest messages raw instead of compacting them.
+Why it matters:
+- Higher values preserve near-term conversational nuance.
+- Lower values free context budget sooner.
+Good starting range:
+- `32` to `64`
+### `leafChunkTokens`
+Caps how much raw material gets summarized into one leaf summary.
+Why it matters:
+- Larger chunks reduce summarization frequency.
+- Smaller chunks create more summaries and more DAG fragmentation.
+Use this when:
+- Your summarizer is rate-limited or expensive.
+- You want fewer but broader leaf summaries.
+### `incrementalMaxDepth`
+Controls how far automatic condensation cascades after leaf compaction.
+Why it matters:
+- `0` keeps only leaf summaries moving automatically.
+- `1` is a practical default for long-running sessions.
+- `-1` allows unlimited cascading, which can be useful for very long histories but is more aggressive.
+### `summaryModel` and `summaryProvider`
+Override the model used for compaction summarization.
+Why they matter:
+- Summary quality compounds upward in the DAG.
+- Cheaper models can reduce cost, but weak summaries create weak recalled context later.
+Guidance:
+- Pick a cheaper model only if it remains reliably structured and faithful.
+- `summaryProvider` only matters when `summaryModel` is a bare model name rather than a canonical provider/model ref.
+### `expansionModel` and `expansionProvider`
+Override the model used by delegated recall flows such as `lcm_expand_query`.
+Why they matter:
+- This lets recall-heavy work use a different cost/latency profile than normal compaction.
+- These are recall-path settings, not compaction-path settings.
+## Complete config surface
+## Core enablement and storage
+### `enabled`
+Boolean on/off switch for the plugin entry.
+Use this when:
+- you need the plugin installed but temporarily disabled
+- you want to distinguish “installed” from “selected and active”
+### `dbPath`
+Overrides the SQLite DB location.
+Why it matters:
+- useful for custom deployments, testing, or isolating environments
+- wrong path selection is a common reason operators think LCM is empty or not growing
+### `largeFileThresholdTokens`
+Threshold for externalizing oversized tool/file payloads out of the main transcript into large-file storage.
+Why it matters:
+- lower values externalize more aggressively
+- higher values keep more payload inline but can bloat storage and compaction inputs
+## Compaction timing and shape
+### `contextThreshold`
+See high-impact settings above.
+### `freshTailCount`
+See high-impact settings above.
+### `leafChunkTokens`
+See high-impact settings above.
+### `leafMinFanout`
+Minimum number of leaf items required before creating a leaf compaction grouping.
+Why it matters:
+- higher values avoid tiny leaf summaries
+- lower values compact sooner but can create overly granular summaries
+### `condensedMinFanout`
+Preferred minimum fanout for condensed summaries during normal condensation.
+Why it matters:
+- controls how eagerly summaries get grouped upward
+- affects DAG breadth and readability of higher-level summaries
+### `condensedMinFanoutHard`
+Hard lower bound for condensed fanout decisions.
+Why it matters:
+- acts as the guardrail when normal fanout preferences cannot be met cleanly
+- mostly useful for advanced tuning or pathological summary-tree shapes
+### `incrementalMaxDepth`
+See high-impact settings above.
+## Session-selection controls
+### `ignoreSessionPatterns`
+Glob-style session-key patterns that should never enter LCM.
+Why it matters:
+- keeps low-value automation or noisy sessions out of the DB
+- useful for excluding certain agent lanes or ephemeral traffic entirely
+### `statelessSessionPatterns`
+Patterns for sessions that may read from LCM but should not write to it.
+Why it matters:
+- useful for sub-agents and ephemeral workers
+- prevents recall helpers from polluting the main history
+### `skipStatelessSessions`
+Boolean that changes how stateless matches are treated.
+Why it matters:
+- when enabled, matching stateless sessions skip LCM persistence entirely
+- use carefully, because it affects whether those sessions behave as readers only or are effectively bypassed for writes
+## Recall-path and delegation controls
+### `expansionModel`
+See high-impact settings above.
+### `expansionProvider`
+See high-impact settings above.
+### `delegationTimeoutMs`
+Maximum time to wait for delegated recall completion.
+Why it matters:
+- lower values fail faster under slow sub-agent paths
+- higher values tolerate deeper recall but can make calls feel stuck longer
+### `maxAssemblyTokenBudget`
+Hard ceiling for assembled LCM token budget.
+Why it matters:
+- useful when the runtime model window is smaller than the surrounding system assumes
+- can prevent oversized assembly on smaller-context models
+## Summary quality and prompt controls
+### `summaryMaxOverageFactor`
+Maximum allowed overage factor before an oversized summary is truncated/downgraded.
+Why it matters:
+- guards against runaway summaries that are much larger than their target budget
+- useful when summary models are verbose or unstable
+### `customInstructions`
+Natural-language instructions injected into summarization prompts.
+Why it matters:
+- lets operators steer formatting or emphasis without patching code
+- should be used sparingly; low-quality instructions can degrade summary quality system-wide
+## Practical operator workflow
+1. Install and enable the plugin.
+2. Set the context-engine slot to `lossless-claw`.
+3. Start with conservative defaults.
+4. Run `/lossless` after startup to confirm path, size, and summary health.
+5. If recall feels weak, revisit `freshTailCount`, `leafChunkTokens`, and summarizer model quality before changing anything else.
+6. Touch advanced knobs like fanout, large-file thresholds, custom instructions, and assembly caps only after a concrete symptom appears.
+## Reading the status output
+`/lossless` is the right command for LCM-local metrics.
+Useful interpretation notes:
+- `tokens in context` is the current LCM frontier token count in the live LCM state.
+- `compression ratio` is shown as a rounded `1:N`, which is easier to read than a tiny percentage for heavily compacted conversations.
+- `/status` may still show a different context number because it reflects the runtime prompt that was actually assembled and sent on the last turn.

package/skills/lossless-claw/references/diagnostics.md ADDED Viewed

@@ -0,0 +1,79 @@
+# Diagnostics
+For the MVP, use the native command surface first.
+## Fast path
+### `/lossless` (`/lcm` alias)
+Use this when you need a quick health snapshot.
+It should answer:
+- Is `lossless-claw` enabled?
+- Is it selected as the context engine?
+- Which DB is active?
+- Is the DB growing as expected?
+- Are summaries present?
+- Are broken or truncated summaries present?
+### `/lossless doctor`
+Use this when summary corruption or truncation is suspected.
+It is the single user-facing diagnostic entrypoint for summary-health issues in the MVP.
+What it should help confirm:
+- whether broken summaries exist
+- whether truncation markers exist
+- which conversations are affected most
+## Interpreting common states
+### `/lossless` tokens vs `/status` context
+These numbers are related, but they are not the same metric.
+- `/lossless` reports LCM-side conversation metrics such as the current frontier token count and compression ratio.
+- `/status` reports the last assembled runtime prompt snapshot for the active model.
+Why they can differ:
+- runtime assembly can trim or omit frontier material before the request is sent
+- model-specific token budgeting and packing happen after LCM frontier selection
+- `/status` reflects a last-run snapshot, while `/lossless` reads live LCM state from the DB
+Treat `/lossless` as the LCM health/shape view, and `/status` as the runtime request view.
+### No summaries yet
+Usually means one of:
+- the conversation has not crossed compaction thresholds yet
+- the plugin is not selected as the context engine
+- writes are being skipped because the session matches stateless or ignored patterns
+### DB exists but stays tiny
+Usually means one of:
+- the plugin is not receiving traffic
+- the wrong DB path is configured
+- the plugin is enabled but not selected
+### Broken or truncated summaries detected
+Treat this as a signal to inspect summary health before trusting compacted context heavily.
+For MVP guidance:
+- keep the user on `/lossless doctor`
+- explain the count and affected conversations
+- avoid advertising separate repair-vs-doctor command families
+## Safe operator advice
+- Do not guess exact historical details from compacted context alone.
+- When a user wants a fact pattern verified, use recall tools to recover evidence.
+- Prefer changing one configuration knob at a time and then re-checking `/lossless`.

package/skills/lossless-claw/references/recall-tools.md ADDED Viewed

@@ -0,0 +1,55 @@
+# Recall Tools
+Use recall tools when the question depends on exact historical evidence from compacted context.
+## Tool selection
+### `lcm_grep`
+Use for:
+- finding whether a term, file name, error string, or identifier appears in compacted history
+- narrowing the search space before deeper inspection
+Do not use it for:
+- answering detail-heavy questions by itself
+### `lcm_describe`
+Use for:
+- inspecting a specific summary or stored-file record by ID
+- reading lineage and content for a known summary node
+Do not use it for:
+- broad discovery when you do not know the target ID yet
+### `lcm_expand_query`
+Use for:
+- focused questions that need richer detail recovered from summaries
+- evidence-oriented follow-up after `lcm_grep` or `lcm_describe`
+This is the best recall tool when the user asks for:
+- exact commands
+- exact file paths
+- precise timestamps
+- root-cause chains
+### `lcm_expand`
+Treat as a specialized sub-agent flow, not the default first step.
+## Recommended workflow
+1. Start with `lcm_grep` to find likely evidence.
+2. Use `lcm_describe` when you have a summary or file ID.
+3. Use `lcm_expand_query` when the answer requires precise recovery rather than a high-level summary.
+## Important guardrail
+Do not infer exact details from summaries alone when the user needs evidence. Expand first or state that the answer still needs expansion.

package/skills/lossless-claw/references/session-lifecycle.md ADDED Viewed

@@ -0,0 +1,59 @@
+# Session lifecycle (`/new` and `/reset`)
+This reference describes the current behavior on `main`.
+## Short version
+For stock `lossless-claw` on current main:
+- OpenClaw handles `/new` and `/reset` as session-reset operations.
+- `lossless-claw` does **not** currently register its own `before_reset` hook or a custom reset policy.
+- `lossless-claw` prefers **`sessionKey`** as the stable identity for an LCM conversation.
+- When the same `sessionKey` reappears with a new `sessionId`, `lossless-claw` updates the stored `sessionId` on the existing LCM conversation row instead of creating a brand-new LCM conversation.
+## What that means in practice
+If a user asks whether `/new` or `/reset` gives them a fresh LCM conversation, the answer is usually **no** under the current implementation.
+They get a fresh OpenClaw session runtime, but LCM continuity still follows the stable `sessionKey` when one is available.
+So today:
+- `/new` and `/reset` can reset the runtime session
+- but LCM history may continue in the same conversation row if the chat/thread keeps the same `sessionKey`
+## Why
+Current lossless-claw conversation resolution does this:
+1. look up by `sessionKey` first
+2. fall back to `sessionId` only when no `sessionKey` match exists
+3. if the `sessionKey` already exists but the `sessionId` changed, update the stored `sessionId` on that same conversation
+That behavior preserves continuity across session resets for the same chat identity.
+## Important limitation
+There is currently **no plugin-specific `/new` vs `/reset` split** in stock lossless-claw docs or runtime behavior.
+If someone is asking for semantics like:
+- `/new` keeps LCM history but rotates transcript
+- `/reset` archives old LCM conversation and starts a new one
+that is a **design/spec topic**, not current stock behavior.
+## Safe operator guidance
+When answering users:
+- do not promise that `/new` or `/reset` clears LCM history
+- explain that current stock behavior follows `sessionKey` continuity
+- if they need a truly separate LCM history, use a different session key context (for example a different chat/thread/binding) or explicit non-MVP migration/surgery tools
+## Relation to `/status`
+This session behavior is separate from `/status` metrics.
+- `/status` reflects runtime session state and the last assembled request snapshot
+- `/lossless` reflects LCM conversation state keyed by the plugin's conversation mapping rules