npm - @xdarkicex/openclaw-memory-libravdb - Versions diffs - 1.4.14 → 1.4.17 - Mend

@xdarkicex/openclaw-memory-libravdb 1.4.14 → 1.4.17

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

package/README.md +105 -363
package/dist/cli.d.ts +1 -1
package/dist/cli.js +11 -0
package/dist/context-engine.d.ts +1 -0
package/dist/context-engine.js +135 -25
package/dist/index.d.ts +2 -0
package/dist/index.js +49 -25
package/docs/README.md +31 -16
package/docs/assets/libravdb-logo.svg +14 -0
package/docs/contributing.md +16 -69
package/docs/development.md +98 -0
package/docs/features.md +125 -0
package/docs/install.md +4 -0
package/docs/installation.md +79 -272
package/docs/models.md +37 -46
package/docs/performance-and-tuning.md +145 -0
package/openclaw.plugin.json +1 -1
package/package.json +1 -1

package/docs/models.md CHANGED Viewed

@@ -1,63 +1,54 @@
 # Model Strategy
-## Why ONNX Over Ollama
+The plugin uses local ONNX-first inference for embeddings and optional
+abstractive summarization. That keeps prompt assembly local, predictable, and
+available offline after assets are installed.
-The plugin uses ONNX-first local inference for embedding and optional abstractive summarization.
+## Why ONNX Over Ollama For The Critical Path
-### Latency
+`assemble` runs before each response build. An embedding request that crosses a
+process and HTTP server boundary adds avoidable tail latency. Local ONNX
+inference inside the sidecar keeps retrieval close to the database and avoids a
+runtime dependency on a separate model server.
-`assemble` is on the critical path before every response build. An embedding request that crosses process and HTTP boundaries adds avoidable tail latency. Local ONNX inference inside the sidecar keeps the retrieval path in the low-millisecond range on the target hardware profile.
-`assemble` is on the critical path before every response build. An embedding
-request that crosses process and HTTP boundaries adds avoidable tail latency.
-Local ONNX inference inside the sidecar keeps the retrieval path local and
-predictable. On the current Apple M2 development machine, the repository's own
-benchmark harness measures roughly `16-23 ms/op` for MiniLM query embeddings and
-about `44 ms/op` for Nomic in the steady-state Go benchmark path.
+ONNX assets can be provisioned once and reused without network access. Given
+fixed weights and input, embeddings are deterministic enough for stable
+similarity ordering and reproducible retrieval behavior.
-### Offline Operation
+The trade-off is artifact size. This project accepts that cost because local
+latency and offline operation are part of the product contract.
-The plugin is designed to be local-first. Requiring a running Ollama server would break that guarantee. ONNX assets can be provisioned once and reused without network or daemon availability.
+## Default And Optional Embedding Profiles
-### Determinism
+The current safe default profile is `all-minilm-l6-v2`.
-ONNX inference is deterministic given fixed weights and input. Deterministic embeddings give stable similarity ordering and reproducible retrieval behavior.
+MiniLM is the default because it keeps local retrieval within the target memory
+envelope on macOS and is less fragile with ONNX Runtime execution than larger
+profiles.
-### Binary Size Trade-Off
+`nomic-embed-text-v1.5` remains available as an explicit opt-in profile for
+long-context retrieval experiments. Nomic's Matryoshka training makes
+`64d -> 256d -> 768d` tiering principled rather than arbitrary truncation, but
+its larger footprint makes it a less conservative default.
-Local models increase the artifact footprint. That is an explicit trade-off accepted by the architecture because predictable latency and offline operation are more important for this plugin than minimal package size.
+For exact profile metadata, read [Embedding profiles](./embedding-profiles.md).
-## Why `nomic-embed-text-v1.5`
+## Summarization
-This is the default embedding profile because it earned the role on two axes:
+Compaction can run without an abstractive summarizer. When the optional T5-small
+assets are not provisioned, the daemon degrades to the extractive path.
-- long-context document support
-- Matryoshka structure for tiered retrieval
+T5-small is the optional local abstractive summarizer because it is small enough
+for CPU-local operation while still useful for session-cluster summaries. Larger
+generative models would increase latency and operational complexity.
-The model’s Matryoshka training is what makes the `64d -> 256d -> 768d` cascade principled rather than arbitrary truncation.
+## Model Roles
-## Why `all-minilm-l6-v2` Still Exists
+| Model/profile | Role |
+|---|---|
+| `all-minilm-l6-v2` | Default lightweight embedding profile. |
+| `nomic-embed-text-v1.5` | Opt-in long-context embedding profile. |
+| T5-small | Optional local abstractive compaction summarizer. |
-MiniLM remains the lightweight fallback profile. It is useful when:
-- the full Nomic profile is unavailable
-- a smaller bundled footprint matters more than long-context or Matryoshka behavior
-It is no longer the quality-first default.
-## Why T5-small for Summarization
-The abstractive summarization path is optional and must remain CPU-feasible on local machines. T5-small fits that constraint better than larger generative models:
-- small enough to run locally
-- expressive enough for session-cluster summarization
-- does not require a remote server
-The plugin still degrades gracefully to extractive compaction when the T5 assets are not provisioned.
-## Model Roles in the System
-- Nomic embedder: quality-first retrieval path, Matryoshka tiers
-- MiniLM: fallback embedder
-- T5-small: optional higher-quality compaction summarizer
-The model strategy is therefore not “use ONNX everywhere because ONNX is fashionable.” It is “use ONNX where local deterministic inference is part of the product contract.”
+External summarizer endpoints, such as Ollama, are optional. They are not part
+of the required retrieval path.

package/docs/performance-and-tuning.md ADDED Viewed

@@ -0,0 +1,145 @@
+# Performance And Tuning
+This document keeps resource sizing, tuning knobs, and benchmark workflows out
+of the root README.
+## Resource Expectations
+The numbers below are local measurements from this repository as of
+`2026-03-29`, unless labeled as estimates.
+### Disk
+Measured local asset sizes:
+- daemon binary: `7.7M`
+- bundled Nomic model directory: `523M`
+- bundled MiniLM fallback model directory: `87M`
+- optional T5 summarizer directory: `371M`
+- unpacked ONNX Runtime directory on macOS arm64: `44M`
+- ONNX Runtime archive download on macOS arm64: `9.5M`
+Vector payload lower bounds:
+- MiniLM `384d`: `384 * 4 = 1536 bytes` per vector
+- Nomic `768d`: `768 * 4 = 3072 bytes` per vector
+Estimated lower-bound vector payload for `10,000` stored turns:
+- MiniLM: about `15.4 MB`
+- Nomic: about `30.7 MB`
+Actual on-disk LibraVDB usage is higher because text, metadata, collection
+structure, and index state are stored as well.
+### Memory
+Measured on Apple M2 by starting the daemon and reading RSS after startup:
+- Nomic embedding path loaded without optional T5 summarizer: about `266 MB`
+- Nomic plus local ONNX T5 summarizer loaded: about `503 MB`
+Not yet bench-measured in this repo:
+- RSS during active inference
+- peak RSS during compaction of large clusters
+### CPU
+Measured from the current Go benchmark harness on Apple M2:
+- MiniLM bundled query embedding: about `22.6 ms/op`
+- MiniLM onnx-local query embedding: about `16.3 ms/op`
+- Nomic onnx-local query embedding: about `43.7 ms/op`
+Measured from a one-off 40-query timing sample on Apple M2:
+- Nomic query embedding `p50`: about `18.61 ms`
+- Nomic query embedding `p95`: about `24.19 ms`
+Measured from a one-off synthetic 50-turn compaction run with the current
+extractive summarizer and Nomic embeddings:
+- `50`-turn extractive compaction wall time: about `3175 ms`
+Not yet bench-measured:
+- equivalent Linux x64 embedding latency on a reference machine
+- `50`-turn compaction wall time through the optional ONNX T5 abstractive path
+## Runtime Tuning Fields
+Prefer the defaults unless you are measuring a specific problem. These fields
+are advanced controls, not required install settings.
+| Field | Effect |
+|---|---|
+| `topK` | Search result budget before prompt fitting. |
+| `alpha`, `beta`, `gamma` | Hybrid scoring weights for similarity, scope, and recency-style signals. |
+| `ingestionGateThreshold` | Durable-memory promotion threshold, default `0.35`. |
+| `gatingWeights` | Domain-adaptive admission weights for conversational and technical memory. |
+| `gatingTechNorm` | Normalization control for the technical-content gate. |
+| `gatingCentroidK` | Number of centroid candidates used by the gate. |
+| `compactionQualityWeight` | How much summary confidence affects retrieval score, default `0.5`. |
+| `recencyLambdaSession` | Session-memory recency decay. |
+| `recencyLambdaUser` | Durable user-memory recency decay. |
+| `recencyLambdaGlobal` | Global-memory recency decay. |
+| `tokenBudgetFraction` | Fraction of host context budget available to memory assembly. |
+| `compactThreshold` | Explicit compaction trigger threshold. |
+| `compactionThresholdFraction` | Dynamic trigger ratio when `compactThreshold` is unset, default `0.8`. |
+| `compactSessionTokenBudget` | Auto-compaction budget since the last compaction, default `2000`; set `0` to disable. |
+| `rpcTimeoutMs` | Sidecar RPC timeout, default `30000`. |
+| `maxRetries` | Retry budget for sidecar RPC calls. |
+| `logLevel` | Plugin log level. |
+Model-related fields live in [Embedding profiles](./embedding-profiles.md) and
+[Models](./models.md).
+## LongMemEval Harness
+The repository includes a local LongMemEval harness that runs the dataset
+through the plugin layer and checks whether the assembled prompt still contains
+the evidence turns.
+The benchmark runner is committed, but the dataset and generated reports are
+not. Keep downloaded data and local outputs under `benchmarks/longmemeval/`,
+which is ignored by default.
+Run it with:
+```bash
+LONGMEMEVAL_DATA_FILE=/path/to/longmemeval_oracle.json pnpm run benchmark:longmemeval
+```
+If you already have a daemon running and do not want the benchmark to spawn
+another one, set:
+```bash
+LONGMEMEVAL_USE_EXISTING_DAEMON=1 \
+LONGMEMEVAL_SIDECAR_PATH=unix:/path/to/libravdb.sock \
+pnpm run benchmark:longmemeval
+```
+Optional controls:
+- `LONGMEMEVAL_LIMIT` caps the number of questions
+- `LONGMEMEVAL_TOPK` changes the search budget
+- `LONGMEMEVAL_OUT_FILE` writes JSONL records for analysis
+The harness writes JSONL incrementally, so partial results survive if a
+transient daemon failure interrupts a long run. If the local test daemon drops
+mid-run, the benchmark restarts it and retries the current instance once before
+recording an error result.
+To score a hypothesis JSONL file with the official LongMemEval evaluator:
+```bash
+LONGMEMEVAL_EVAL_REPO=/path/to/LongMemEval \
+LONGMEMEVAL_HYPOTHESIS_FILE=/path/to/hypotheses.jsonl \
+LONGMEMEVAL_DATA_FILE=/path/to/longmemeval_oracle.json \
+OPENAI_API_KEY=... \
+pnpm run benchmark:longmemeval:score
+```
+The scorer wrapper shells out to the official Python evaluation script and then
+prints aggregate metrics from the generated log when available.

package/openclaw.plugin.json CHANGED Viewed

@@ -2,7 +2,7 @@
   "id": "libravdb-memory",
   "name": "LibraVDB Memory",
   "description": "Persistent vector memory with three-tier hybrid scoring",
-  "version": "1.4.14",
+  "version": "1.4.17",
   "kind": [
     "memory",
     "context-engine"

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@xdarkicex/openclaw-memory-libravdb",
-  "version": "1.4.14",
+  "version": "1.4.17",
   "type": "module",
   "main": "./dist/index.js",
   "types": "./dist/index.d.ts",