@xdarkicex/openclaw-memory-libravdb 1.4.14 → 1.4.17

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/docs/models.md CHANGED
@@ -1,63 +1,54 @@
1
1
  # Model Strategy
2
2
 
3
- ## Why ONNX Over Ollama
3
+ The plugin uses local ONNX-first inference for embeddings and optional
4
+ abstractive summarization. That keeps prompt assembly local, predictable, and
5
+ available offline after assets are installed.
4
6
 
5
- The plugin uses ONNX-first local inference for embedding and optional abstractive summarization.
7
+ ## Why ONNX Over Ollama For The Critical Path
6
8
 
7
- ### Latency
9
+ `assemble` runs before each response build. An embedding request that crosses a
10
+ process and HTTP server boundary adds avoidable tail latency. Local ONNX
11
+ inference inside the sidecar keeps retrieval close to the database and avoids a
12
+ runtime dependency on a separate model server.
8
13
 
9
- `assemble` is on the critical path before every response build. An embedding request that crosses process and HTTP boundaries adds avoidable tail latency. Local ONNX inference inside the sidecar keeps the retrieval path in the low-millisecond range on the target hardware profile.
10
- `assemble` is on the critical path before every response build. An embedding
11
- request that crosses process and HTTP boundaries adds avoidable tail latency.
12
- Local ONNX inference inside the sidecar keeps the retrieval path local and
13
- predictable. On the current Apple M2 development machine, the repository's own
14
- benchmark harness measures roughly `16-23 ms/op` for MiniLM query embeddings and
15
- about `44 ms/op` for Nomic in the steady-state Go benchmark path.
14
+ ONNX assets can be provisioned once and reused without network access. Given
15
+ fixed weights and input, embeddings are deterministic enough for stable
16
+ similarity ordering and reproducible retrieval behavior.
16
17
 
17
- ### Offline Operation
18
+ The trade-off is artifact size. This project accepts that cost because local
19
+ latency and offline operation are part of the product contract.
18
20
 
19
- The plugin is designed to be local-first. Requiring a running Ollama server would break that guarantee. ONNX assets can be provisioned once and reused without network or daemon availability.
21
+ ## Default And Optional Embedding Profiles
20
22
 
21
- ### Determinism
23
+ The current safe default profile is `all-minilm-l6-v2`.
22
24
 
23
- ONNX inference is deterministic given fixed weights and input. Deterministic embeddings give stable similarity ordering and reproducible retrieval behavior.
25
+ MiniLM is the default because it keeps local retrieval within the target memory
26
+ envelope on macOS and is less fragile with ONNX Runtime execution than larger
27
+ profiles.
24
28
 
25
- ### Binary Size Trade-Off
29
+ `nomic-embed-text-v1.5` remains available as an explicit opt-in profile for
30
+ long-context retrieval experiments. Nomic's Matryoshka training makes
31
+ `64d -> 256d -> 768d` tiering principled rather than arbitrary truncation, but
32
+ its larger footprint makes it a less conservative default.
26
33
 
27
- Local models increase the artifact footprint. That is an explicit trade-off accepted by the architecture because predictable latency and offline operation are more important for this plugin than minimal package size.
34
+ For exact profile metadata, read [Embedding profiles](./embedding-profiles.md).
28
35
 
29
- ## Why `nomic-embed-text-v1.5`
36
+ ## Summarization
30
37
 
31
- This is the default embedding profile because it earned the role on two axes:
38
+ Compaction can run without an abstractive summarizer. When the optional T5-small
39
+ assets are not provisioned, the daemon degrades to the extractive path.
32
40
 
33
- - long-context document support
34
- - Matryoshka structure for tiered retrieval
41
+ T5-small is the optional local abstractive summarizer because it is small enough
42
+ for CPU-local operation while still useful for session-cluster summaries. Larger
43
+ generative models would increase latency and operational complexity.
35
44
 
36
- The model’s Matryoshka training is what makes the `64d -> 256d -> 768d` cascade principled rather than arbitrary truncation.
45
+ ## Model Roles
37
46
 
38
- ## Why `all-minilm-l6-v2` Still Exists
47
+ | Model/profile | Role |
48
+ |---|---|
49
+ | `all-minilm-l6-v2` | Default lightweight embedding profile. |
50
+ | `nomic-embed-text-v1.5` | Opt-in long-context embedding profile. |
51
+ | T5-small | Optional local abstractive compaction summarizer. |
39
52
 
40
- MiniLM remains the lightweight fallback profile. It is useful when:
41
-
42
- - the full Nomic profile is unavailable
43
- - a smaller bundled footprint matters more than long-context or Matryoshka behavior
44
-
45
- It is no longer the quality-first default.
46
-
47
- ## Why T5-small for Summarization
48
-
49
- The abstractive summarization path is optional and must remain CPU-feasible on local machines. T5-small fits that constraint better than larger generative models:
50
-
51
- - small enough to run locally
52
- - expressive enough for session-cluster summarization
53
- - does not require a remote server
54
-
55
- The plugin still degrades gracefully to extractive compaction when the T5 assets are not provisioned.
56
-
57
- ## Model Roles in the System
58
-
59
- - Nomic embedder: quality-first retrieval path, Matryoshka tiers
60
- - MiniLM: fallback embedder
61
- - T5-small: optional higher-quality compaction summarizer
62
-
63
- The model strategy is therefore not “use ONNX everywhere because ONNX is fashionable.” It is “use ONNX where local deterministic inference is part of the product contract.”
53
+ External summarizer endpoints, such as Ollama, are optional. They are not part
54
+ of the required retrieval path.
@@ -0,0 +1,145 @@
1
+ # Performance And Tuning
2
+
3
+ This document keeps resource sizing, tuning knobs, and benchmark workflows out
4
+ of the root README.
5
+
6
+ ## Resource Expectations
7
+
8
+ The numbers below are local measurements from this repository as of
9
+ `2026-03-29`, unless labeled as estimates.
10
+
11
+ ### Disk
12
+
13
+ Measured local asset sizes:
14
+
15
+ - daemon binary: `7.7M`
16
+ - bundled Nomic model directory: `523M`
17
+ - bundled MiniLM fallback model directory: `87M`
18
+ - optional T5 summarizer directory: `371M`
19
+ - unpacked ONNX Runtime directory on macOS arm64: `44M`
20
+ - ONNX Runtime archive download on macOS arm64: `9.5M`
21
+
22
+ Vector payload lower bounds:
23
+
24
+ - MiniLM `384d`: `384 * 4 = 1536 bytes` per vector
25
+ - Nomic `768d`: `768 * 4 = 3072 bytes` per vector
26
+
27
+ Estimated lower-bound vector payload for `10,000` stored turns:
28
+
29
+ - MiniLM: about `15.4 MB`
30
+ - Nomic: about `30.7 MB`
31
+
32
+ Actual on-disk LibraVDB usage is higher because text, metadata, collection
33
+ structure, and index state are stored as well.
34
+
35
+ ### Memory
36
+
37
+ Measured on Apple M2 by starting the daemon and reading RSS after startup:
38
+
39
+ - Nomic embedding path loaded without optional T5 summarizer: about `266 MB`
40
+ - Nomic plus local ONNX T5 summarizer loaded: about `503 MB`
41
+
42
+ Not yet bench-measured in this repo:
43
+
44
+ - RSS during active inference
45
+ - peak RSS during compaction of large clusters
46
+
47
+ ### CPU
48
+
49
+ Measured from the current Go benchmark harness on Apple M2:
50
+
51
+ - MiniLM bundled query embedding: about `22.6 ms/op`
52
+ - MiniLM onnx-local query embedding: about `16.3 ms/op`
53
+ - Nomic onnx-local query embedding: about `43.7 ms/op`
54
+
55
+ Measured from a one-off 40-query timing sample on Apple M2:
56
+
57
+ - Nomic query embedding `p50`: about `18.61 ms`
58
+ - Nomic query embedding `p95`: about `24.19 ms`
59
+
60
+ Measured from a one-off synthetic 50-turn compaction run with the current
61
+ extractive summarizer and Nomic embeddings:
62
+
63
+ - `50`-turn extractive compaction wall time: about `3175 ms`
64
+
65
+ Not yet bench-measured:
66
+
67
+ - equivalent Linux x64 embedding latency on a reference machine
68
+ - `50`-turn compaction wall time through the optional ONNX T5 abstractive path
69
+
70
+ ## Runtime Tuning Fields
71
+
72
+ Prefer the defaults unless you are measuring a specific problem. These fields
73
+ are advanced controls, not required install settings.
74
+
75
+ | Field | Effect |
76
+ |---|---|
77
+ | `topK` | Search result budget before prompt fitting. |
78
+ | `alpha`, `beta`, `gamma` | Hybrid scoring weights for similarity, scope, and recency-style signals. |
79
+ | `ingestionGateThreshold` | Durable-memory promotion threshold, default `0.35`. |
80
+ | `gatingWeights` | Domain-adaptive admission weights for conversational and technical memory. |
81
+ | `gatingTechNorm` | Normalization control for the technical-content gate. |
82
+ | `gatingCentroidK` | Number of centroid candidates used by the gate. |
83
+ | `compactionQualityWeight` | How much summary confidence affects retrieval score, default `0.5`. |
84
+ | `recencyLambdaSession` | Session-memory recency decay. |
85
+ | `recencyLambdaUser` | Durable user-memory recency decay. |
86
+ | `recencyLambdaGlobal` | Global-memory recency decay. |
87
+ | `tokenBudgetFraction` | Fraction of host context budget available to memory assembly. |
88
+ | `compactThreshold` | Explicit compaction trigger threshold. |
89
+ | `compactionThresholdFraction` | Dynamic trigger ratio when `compactThreshold` is unset, default `0.8`. |
90
+ | `compactSessionTokenBudget` | Auto-compaction budget since the last compaction, default `2000`; set `0` to disable. |
91
+ | `rpcTimeoutMs` | Sidecar RPC timeout, default `30000`. |
92
+ | `maxRetries` | Retry budget for sidecar RPC calls. |
93
+ | `logLevel` | Plugin log level. |
94
+
95
+ Model-related fields live in [Embedding profiles](./embedding-profiles.md) and
96
+ [Models](./models.md).
97
+
98
+ ## LongMemEval Harness
99
+
100
+ The repository includes a local LongMemEval harness that runs the dataset
101
+ through the plugin layer and checks whether the assembled prompt still contains
102
+ the evidence turns.
103
+
104
+ The benchmark runner is committed, but the dataset and generated reports are
105
+ not. Keep downloaded data and local outputs under `benchmarks/longmemeval/`,
106
+ which is ignored by default.
107
+
108
+ Run it with:
109
+
110
+ ```bash
111
+ LONGMEMEVAL_DATA_FILE=/path/to/longmemeval_oracle.json pnpm run benchmark:longmemeval
112
+ ```
113
+
114
+ If you already have a daemon running and do not want the benchmark to spawn
115
+ another one, set:
116
+
117
+ ```bash
118
+ LONGMEMEVAL_USE_EXISTING_DAEMON=1 \
119
+ LONGMEMEVAL_SIDECAR_PATH=unix:/path/to/libravdb.sock \
120
+ pnpm run benchmark:longmemeval
121
+ ```
122
+
123
+ Optional controls:
124
+
125
+ - `LONGMEMEVAL_LIMIT` caps the number of questions
126
+ - `LONGMEMEVAL_TOPK` changes the search budget
127
+ - `LONGMEMEVAL_OUT_FILE` writes JSONL records for analysis
128
+
129
+ The harness writes JSONL incrementally, so partial results survive if a
130
+ transient daemon failure interrupts a long run. If the local test daemon drops
131
+ mid-run, the benchmark restarts it and retries the current instance once before
132
+ recording an error result.
133
+
134
+ To score a hypothesis JSONL file with the official LongMemEval evaluator:
135
+
136
+ ```bash
137
+ LONGMEMEVAL_EVAL_REPO=/path/to/LongMemEval \
138
+ LONGMEMEVAL_HYPOTHESIS_FILE=/path/to/hypotheses.jsonl \
139
+ LONGMEMEVAL_DATA_FILE=/path/to/longmemeval_oracle.json \
140
+ OPENAI_API_KEY=... \
141
+ pnpm run benchmark:longmemeval:score
142
+ ```
143
+
144
+ The scorer wrapper shells out to the official Python evaluation script and then
145
+ prints aggregate metrics from the generated log when available.
@@ -2,7 +2,7 @@
2
2
  "id": "libravdb-memory",
3
3
  "name": "LibraVDB Memory",
4
4
  "description": "Persistent vector memory with three-tier hybrid scoring",
5
- "version": "1.4.14",
5
+ "version": "1.4.17",
6
6
  "kind": [
7
7
  "memory",
8
8
  "context-engine"
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@xdarkicex/openclaw-memory-libravdb",
3
- "version": "1.4.14",
3
+ "version": "1.4.17",
4
4
  "type": "module",
5
5
  "main": "./dist/index.js",
6
6
  "types": "./dist/index.d.ts",