@xdarkicex/openclaw-memory-libravdb 1.3.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (80) hide show
  1. package/README.md +46 -0
  2. package/docs/README.md +14 -0
  3. package/docs/architecture-decisions/README.md +6 -0
  4. package/docs/architecture-decisions/adr-001-onnx-over-ollama.md +21 -0
  5. package/docs/architecture-decisions/adr-002-libravdb-over-lancedb.md +19 -0
  6. package/docs/architecture-decisions/adr-003-convex-gating-over-threshold.md +27 -0
  7. package/docs/architecture-decisions/adr-004-sidecar-over-native-ts.md +21 -0
  8. package/docs/architecture.md +188 -0
  9. package/docs/contributing.md +76 -0
  10. package/docs/dependencies.md +38 -0
  11. package/docs/embedding-profiles.md +42 -0
  12. package/docs/gating.md +329 -0
  13. package/docs/implementation.md +381 -0
  14. package/docs/installation.md +272 -0
  15. package/docs/mathematics.md +695 -0
  16. package/docs/models.md +63 -0
  17. package/docs/problem.md +64 -0
  18. package/docs/security.md +86 -0
  19. package/openclaw.plugin.json +84 -0
  20. package/package.json +41 -0
  21. package/scripts/build-sidecar.sh +30 -0
  22. package/scripts/postinstall.js +169 -0
  23. package/scripts/setup.sh +20 -0
  24. package/scripts/setup.ts +505 -0
  25. package/scripts/sidecar-release.d.ts +4 -0
  26. package/scripts/sidecar-release.js +17 -0
  27. package/sidecar/cmd/inspect_onnx/main.go +105 -0
  28. package/sidecar/compact/gate.go +273 -0
  29. package/sidecar/compact/gate_test.go +85 -0
  30. package/sidecar/compact/summarize.go +345 -0
  31. package/sidecar/compact/summarize_test.go +319 -0
  32. package/sidecar/compact/tokens.go +11 -0
  33. package/sidecar/config/config.go +119 -0
  34. package/sidecar/config/config_test.go +75 -0
  35. package/sidecar/embed/engine.go +696 -0
  36. package/sidecar/embed/engine_test.go +349 -0
  37. package/sidecar/embed/matryoshka.go +93 -0
  38. package/sidecar/embed/matryoshka_test.go +150 -0
  39. package/sidecar/embed/onnx_local.go +319 -0
  40. package/sidecar/embed/onnx_local_test.go +159 -0
  41. package/sidecar/embed/profile_contract_test.go +71 -0
  42. package/sidecar/embed/profile_eval_test.go +923 -0
  43. package/sidecar/embed/profiles.go +39 -0
  44. package/sidecar/go.mod +21 -0
  45. package/sidecar/go.sum +30 -0
  46. package/sidecar/health/check.go +33 -0
  47. package/sidecar/health/check_test.go +55 -0
  48. package/sidecar/main.go +151 -0
  49. package/sidecar/model/encoder.go +222 -0
  50. package/sidecar/model/registry.go +262 -0
  51. package/sidecar/model/registry_test.go +102 -0
  52. package/sidecar/model/seq2seq.go +133 -0
  53. package/sidecar/server/rpc.go +343 -0
  54. package/sidecar/server/rpc_test.go +350 -0
  55. package/sidecar/server/transport.go +160 -0
  56. package/sidecar/store/libravdb.go +676 -0
  57. package/sidecar/store/libravdb_test.go +472 -0
  58. package/sidecar/summarize/engine.go +360 -0
  59. package/sidecar/summarize/engine_test.go +148 -0
  60. package/sidecar/summarize/onnx_local.go +494 -0
  61. package/sidecar/summarize/onnx_local_test.go +48 -0
  62. package/sidecar/summarize/profiles.go +52 -0
  63. package/sidecar/summarize/tokenizer.go +13 -0
  64. package/sidecar/summarize/tokenizer_hf.go +76 -0
  65. package/sidecar/summarize/util.go +13 -0
  66. package/src/cli.ts +205 -0
  67. package/src/context-engine.ts +195 -0
  68. package/src/index.ts +27 -0
  69. package/src/memory-provider.ts +24 -0
  70. package/src/openclaw-plugin-sdk.d.ts +53 -0
  71. package/src/plugin-runtime.ts +67 -0
  72. package/src/recall-cache.ts +34 -0
  73. package/src/recall-utils.ts +22 -0
  74. package/src/rpc.ts +84 -0
  75. package/src/scoring.ts +58 -0
  76. package/src/sidecar.ts +506 -0
  77. package/src/tokens.ts +36 -0
  78. package/src/types.ts +146 -0
  79. package/tsconfig.json +20 -0
  80. package/tsconfig.tests.json +12 -0
package/README.md ADDED
@@ -0,0 +1,46 @@
1
+ # LibraVDB Memory
2
+
3
+ ## Install
4
+
5
+ ```bash
6
+ openclaw plugins install @xdarkicex/openclaw-memory-libravdb
7
+ ```
8
+
9
+ The installer builds the Go sidecar, provisions the bundled embedding/runtime assets, optionally provisions the T5 summarizer, and fails fast if the sidecar cannot pass its startup health check.
10
+
11
+ Minimum host version:
12
+
13
+ - OpenClaw `>= 2026.3.22`
14
+
15
+ Security note:
16
+
17
+ - `scripts/setup.ts` verifies SHA-256 checksums for downloaded sidecar/runtime/model assets
18
+ - the sidecar installer downloads prebuilt sidecar release assets only from `github.com/xDarkicex/openclaw-memory-libravdb` releases
19
+ - after install, the plugin makes no required network calls for embedding or extractive compaction
20
+ - the only optional runtime network path is an explicitly configured remote summarizer endpoint such as `ollama-local`
21
+
22
+ ## Activate
23
+
24
+ Add this to `~/.openclaw/openclaw.json`:
25
+
26
+ ```json
27
+ {
28
+ "plugins": {
29
+ "slots": {
30
+ "memory": "libravdb-memory"
31
+ }
32
+ }
33
+ }
34
+ ```
35
+
36
+ Without the `plugins.slots.memory` entry, OpenClaw's default memory continues to run in parallel and this plugin does not take over the exclusive memory slot.
37
+
38
+ ## Verify
39
+
40
+ Run:
41
+
42
+ ```bash
43
+ openclaw memory status
44
+ ```
45
+
46
+ Expected output includes a readable status table showing the sidecar is running, stored turn/memory counts, the active ingestion gate threshold, and whether the abstractive summarizer is provisioned.
package/docs/README.md ADDED
@@ -0,0 +1,14 @@
1
+ # Documentation Index
2
+
3
+ - [installation.md](./installation.md) - Complete install, activation, verification, and troubleshooting reference.
4
+ - [architecture.md](./architecture.md) - End-to-end component model, turn lifecycle, compaction flow, and degraded behavior.
5
+ - [problem.md](./problem.md) - Technical argument for replacing the stock OpenClaw memory lifecycle in this use case.
6
+ - [mathematics.md](./mathematics.md) - Formal reference for hybrid scoring, decay, token budgeting, Matryoshka retrieval, and compaction.
7
+ - [gating.md](./gating.md) - Full derivation and calibration guide for the domain-adaptive gating scalar.
8
+ - [implementation.md](./implementation.md) - Non-obvious implementation decisions and their rationale.
9
+ - [dependencies.md](./dependencies.md) - Why LibraVDB and slab-based storage were chosen for this plugin.
10
+ - [models.md](./models.md) - ONNX model strategy, latency trade-offs, and shipped model roles.
11
+ - [security.md](./security.md) - Security model, untrusted-memory framing, isolation guarantees, and deletion boundaries.
12
+ - [contributing.md](./contributing.md) - Contributor workflow, prerequisites, and invariant test expectations.
13
+ - [architecture-decisions/README.md](./architecture-decisions/README.md) - Index of the repository ADRs.
14
+ - [embedding-profiles.md](./embedding-profiles.md) - Shipped embedding profile baseline and current profile metadata.
@@ -0,0 +1,6 @@
1
+ # Architecture Decisions
2
+
3
+ - [adr-001-onnx-over-ollama.md](./adr-001-onnx-over-ollama.md)
4
+ - [adr-002-libravdb-over-lancedb.md](./adr-002-libravdb-over-lancedb.md)
5
+ - [adr-003-convex-gating-over-threshold.md](./adr-003-convex-gating-over-threshold.md)
6
+ - [adr-004-sidecar-over-native-ts.md](./adr-004-sidecar-over-native-ts.md)
@@ -0,0 +1,21 @@
1
+ # ADR-001: ONNX Over Ollama
2
+
3
+ ## Context
4
+
5
+ The plugin needs local embedding inference on the prompt-assembly critical path and optional local summarization for compaction.
6
+
7
+ ## Decision
8
+
9
+ Use ONNX-first local inference for embedding and optional summarization. Treat Ollama as an optional external backend, not the primary dependency.
10
+
11
+ ## Alternatives Considered
12
+
13
+ - Ollama for both embedding and summarization
14
+ - remote inference APIs
15
+
16
+ ## Consequences
17
+
18
+ - predictable latency
19
+ - deterministic embeddings
20
+ - offline operation
21
+ - larger local artifact footprint
@@ -0,0 +1,19 @@
1
+ # ADR-002: LibraVDB Over LanceDB
2
+
3
+ ## Context
4
+
5
+ The plugin needs multi-scope namespacing, delete-heavy compaction flows, and local-first operation without a Python dependency chain.
6
+
7
+ ## Decision
8
+
9
+ Use LibraVDB as the vector store.
10
+
11
+ ## Alternatives Considered
12
+
13
+ - LanceDB
14
+
15
+ ## Consequences
16
+
17
+ - better fit for collection-scoped lifecycle management
18
+ - more control over local operational behavior
19
+ - deeper ownership of vector store behavior and tuning
@@ -0,0 +1,27 @@
1
+ # ADR-003: Convex Gating Over Per-Domain Thresholds
2
+
3
+ ## Context
4
+
5
+ A single conversational gating scalar suppressed useful technical workflow memory because conversational redundancy and technical redundancy mean different things.
6
+
7
+ ## Decision
8
+
9
+ Use a convex mixture:
10
+
11
+ $$
12
+ G(t) = (1 - T(t))G_{\mathrm{conv}}(t) + T(t)G_{\mathrm{tech}}(t)
13
+ $$
14
+
15
+ instead of per-domain thresholds or user classification flags.
16
+
17
+ ## Alternatives Considered
18
+
19
+ - separate thresholds for technical vs conversational users
20
+ - explicit user-level mode flags
21
+ - a larger conversational heuristic rule set
22
+
23
+ ## Consequences
24
+
25
+ - one threshold instead of multiple user modes
26
+ - continuous behavior on mixed content
27
+ - greater observability through decomposed signals
@@ -0,0 +1,21 @@
1
+ # ADR-004: Sidecar Over Native TypeScript
2
+
3
+ ## Context
4
+
5
+ The plugin requires local vector storage, ONNX inference, transport isolation, and bounded failure semantics that should not crash the host chat session.
6
+
7
+ ## Decision
8
+
9
+ Implement the memory engine as a Go sidecar with a narrow JSON-RPC transport boundary.
10
+
11
+ ## Alternatives Considered
12
+
13
+ - native TypeScript implementation
14
+ - WASM-only embedding and storage path
15
+
16
+ ## Consequences
17
+
18
+ - strong process isolation
19
+ - efficient local inference and storage integration
20
+ - extra packaging complexity
21
+ - a separate binary distribution story
@@ -0,0 +1,188 @@
1
+ # System Architecture
2
+
3
+ This document describes the current implemented architecture, not just the
4
+ design intent. Every component and data flow here maps to code in the
5
+ repository as of the current `main` branch.
6
+
7
+ ## 1. Component Map
8
+
9
+ ```mermaid
10
+ flowchart LR
11
+ Host["OpenClaw host process\n(TypeScript plugin shell)"]
12
+ CE["Context engine factory\nbootstrap / ingest / assemble / compact"]
13
+ MPS["memoryPromptSection\nuser+global recall"]
14
+ Runtime["Plugin runtime\nlazy sidecar startup + RPC client"]
15
+ Sidecar["Go sidecar process"]
16
+ RPC["JSON-RPC over newline-delimited frames\nUnix socket or TCP loopback on Windows"]
17
+ Store["LibraVDB store on disk"]
18
+ Session["session:<sessionId>"]
19
+ Turns["turns:<userId>"]
20
+ User["user:<userId>"]
21
+ Global["global"]
22
+ Dirty["_tier_dirty"]
23
+ Embed["ONNX embedding engine"]
24
+ Extractive["Extractive summarizer"]
25
+ T5["Optional ONNX T5 summarizer"]
26
+ Ollama["Optional Ollama summarizer endpoint"]
27
+
28
+ Host --> CE
29
+ Host --> MPS
30
+ CE --> Runtime
31
+ MPS --> Runtime
32
+ Runtime --> RPC
33
+ RPC --> Sidecar
34
+ Sidecar --> Embed
35
+ Sidecar --> Extractive
36
+ Sidecar --> T5
37
+ Sidecar --> Ollama
38
+ Sidecar --> Store
39
+ Store --> Session
40
+ Store --> Turns
41
+ Store --> User
42
+ Store --> Global
43
+ Store --> Dirty
44
+ ```
45
+
46
+ Implementation anchors:
47
+
48
+ - plugin entry: [`src/index.ts`](../src/index.ts)
49
+ - lazy runtime startup: [`src/plugin-runtime.ts`](../src/plugin-runtime.ts)
50
+ - sidecar supervision and endpoint discovery: [`src/sidecar.ts`](../src/sidecar.ts)
51
+ - transport listener: [`sidecar/server/transport.go`](../sidecar/server/transport.go)
52
+ - RPC method table: [`sidecar/server/rpc.go`](../sidecar/server/rpc.go)
53
+ - store: [`sidecar/store/libravdb.go`](../sidecar/store/libravdb.go)
54
+
55
+ ## 2. Single-Turn Data Flow
56
+
57
+ ### 2.1 `ingest`
58
+
59
+ Implemented in [`src/context-engine.ts`](../src/context-engine.ts).
60
+
61
+ For every non-heartbeat message:
62
+
63
+ 1. The host gets an RPC client from the plugin runtime. This lazily starts the
64
+ sidecar if it is not already running.
65
+ 2. The message is written to `session:<sessionId>` with `type: "turn"`.
66
+ 3. If `message.role === "user"`, the same text is written to `turns:<userId>`.
67
+ 4. The host calls `gating_scalar` with `{ userId, text }`.
68
+ 5. If `g >= ingestionGateThreshold`, the turn is promoted into
69
+ `user:<userId>` with the full gating decomposition in metadata.
70
+
71
+ Important constraints from the current implementation:
72
+
73
+ - session insertion is fire-and-forget
74
+ - durable promotion is best-effort
75
+ - gating failure does not fail the user turn
76
+ - assistant turns are stored in session memory but are not promoted into
77
+ durable user memory
78
+
79
+ ### 2.2 `memoryPromptSection`
80
+
81
+ Implemented in [`src/memory-provider.ts`](../src/memory-provider.ts).
82
+
83
+ Before the main assembly path runs, the plugin builds a lightweight recall
84
+ section:
85
+
86
+ 1. search `user:<userId>`
87
+ 2. search `global`
88
+ 3. hybrid-rank the combined hits
89
+ 4. fit them to a fixed prompt budget of `800` estimated tokens
90
+ 5. return a textual header fragment for the host prompt
91
+
92
+ This path does not search session memory. Its job is durable context recall, not
93
+ active-turn recall.
94
+
95
+ ### 2.3 `assemble`
96
+
97
+ Implemented in [`src/context-engine.ts`](../src/context-engine.ts).
98
+
99
+ For the current query text (last message content), the host:
100
+
101
+ 1. builds an exclusion set from the most recent message ids
102
+ 2. searches `session:<sessionId>`, `user:<userId>`, and `global` in parallel
103
+ 3. hybrid-ranks the combined results using host-side scoring
104
+ 4. fits the ranked set to `tokenBudget * tokenBudgetFraction`
105
+ 5. prepends the selected memories as synthetic `system` messages
106
+ 6. returns both the expanded message array and a `systemPromptAddition`
107
+
108
+ Current implementation details that matter:
109
+
110
+ - user/global hits may be reused from the earlier prompt-section cache
111
+ - `assemble` falls back to the unmodified message list on RPC failure
112
+ - `assemble` does not mutate the original `messages` array in place; it returns
113
+ a new array
114
+
115
+ ## 3. Compaction Data Flow
116
+
117
+ Implemented primarily in [`src/context-engine.ts`](../src/context-engine.ts)
118
+ and [`sidecar/compact/summarize.go`](../sidecar/compact/summarize.go).
119
+
120
+ When compaction is triggered:
121
+
122
+ 1. the host calls `compact_session` with `{ sessionId, force, targetSize }`
123
+ 2. the sidecar loads eligible non-summary turns from `session:<sessionId>`
124
+ 3. turns are sorted by `(ts, id)` and partitioned into deterministic
125
+ chronological clusters
126
+ 4. each cluster is routed to:
127
+ - extractive summarization by default
128
+ - optional abstractive summarization if `mean(gating_score) >= 0.60` and an
129
+ abstractive summarizer is ready
130
+ 5. the summary record is inserted back into the same session collection
131
+ 6. source turns are deleted only after summary insertion succeeds
132
+
133
+ Current implementation facts:
134
+
135
+ - compaction only touches `session:<sessionId>`
136
+ - raw source turns are preserved if summary insertion fails
137
+ - delete failure logs and leaves the inserted summary in place
138
+ - compaction logs `cluster_id`, `mean_gating_score`, and `summarizer_used`
139
+
140
+ ## 4. Failure Modes and Degraded Behavior
141
+
142
+ The table below reflects current code behavior, with notes where it diverges
143
+ from the original spec phrasing.
144
+
145
+ | Failure | Current behavior | User impact |
146
+ |---|---|---|
147
+ | Sidecar unavailable on first RPC use | `getRpc()` rejects when lazy startup or health check fails | That hook fails or falls back, but plugin registration itself does not crash eagerly |
148
+ | Sidecar connection closes mid-session | `SidecarSupervisor` retries with exponential backoff until retry budget is exhausted, then enters degraded mode | Memory becomes unavailable until restart succeeds |
149
+ | `memoryPromptSection` RPC failure | individual searches are caught and replaced with empty result sets | Prompt section becomes empty rather than crashing the run |
150
+ | `assemble` RPC failure | returns original messages, original token count, and empty `systemPromptAddition` | That turn gets no recall augmentation |
151
+ | `ingest` gating or durable insert failure | session write already happened; durable promotion is skipped | Session memory survives, durable memory may miss that turn |
152
+ | Compaction summarizer unavailable | extractive summarizer remains required; optional abstractive path is skipped | Compaction still runs extractively when extractive is healthy |
153
+ | Disk full or insert error | Go RPC returns an error; TypeScript caller logs or degrades | New records are not stored, but chat continues |
154
+ | Empty lower Matryoshka tiers | cascade search naturally falls through because empty tiers return `best = 0.0` | Retrieval degrades to higher tiers without returning false confident exits |
155
+
156
+ Relevant code:
157
+
158
+ - retry/degraded behavior: [`src/sidecar.ts`](../src/sidecar.ts)
159
+ - lazy startup and health gate: [`src/plugin-runtime.ts`](../src/plugin-runtime.ts)
160
+ - compaction routing and insert/delete ordering:
161
+ [`sidecar/compact/summarize.go`](../sidecar/compact/summarize.go)
162
+
163
+ ## 5. Gating Decision Path
164
+
165
+ The gating decision spans both layers:
166
+
167
+ 1. `ingest` writes the user turn to `turns:<userId>`
168
+ 2. the host calls `gating_scalar`
169
+ 3. the Go sidecar performs exactly two searches:
170
+ - `SearchText("turns:<userId>", text, 10, nil)`
171
+ - `SearchText("user:<userId>", text, 5, nil)`
172
+ 4. the sidecar computes `GatingSignals` with [`compact.ComputeGating`](../sidecar/compact/gate.go)
173
+ 5. the host compares `g` to `ingestionGateThreshold`
174
+ 6. on pass, the host writes the turn into `user:<userId>` with all gating
175
+ metadata fields
176
+ 7. later, compaction computes the mean `gating_score` of a cluster and may route
177
+ high-value clusters to the abstractive summarizer
178
+
179
+ If the gate fails:
180
+
181
+ - the turn still exists in `session:<sessionId>`
182
+ - the turn still exists in `turns:<userId>`
183
+ - the turn is not promoted into `user:<userId>`
184
+ - downstream durable recall and compaction routing cannot use that turn's
185
+ gating metadata because it was never promoted
186
+
187
+ That makes the gate a durable-memory admission control, not a full-ingestion
188
+ blocker.
@@ -0,0 +1,76 @@
1
+ # Contributing
2
+
3
+ ## Prerequisites
4
+
5
+ - Node.js `>= 22`
6
+ - Go `>= 1.22` for development and local fallback builds
7
+ - `pnpm`
8
+ - OpenClaw CLI for end-to-end plugin testing
9
+
10
+ ## Core Validation Commands
11
+
12
+ TypeScript and unit checks:
13
+
14
+ ```bash
15
+ pnpm check
16
+ ```
17
+
18
+ Integration tests:
19
+
20
+ ```bash
21
+ npm run test:integration
22
+ ```
23
+
24
+ Go sidecar tests:
25
+
26
+ ```bash
27
+ cd sidecar
28
+ env GOCACHE=/tmp/openclaw-memory-libravdb-gocache go test ./...
29
+ env GOCACHE=/tmp/openclaw-memory-libravdb-gocache go test -race ./...
30
+ ```
31
+
32
+ ## Local Sidecar Build
33
+
34
+ ```bash
35
+ bash scripts/build-sidecar.sh
36
+ ```
37
+
38
+ This creates `.sidecar-bin/libravdb-sidecar` and copies locally available bundled assets into `.sidecar-bin/`.
39
+
40
+ ## Gating Invariants
41
+
42
+ Do not weaken the gate invariants casually. The tests in `sidecar/compact/gate_test.go` check structural properties:
43
+
44
+ - empty-memory novelty
45
+ - saturation veto
46
+ - convex boundedness
47
+ - conversational collapse at `T = 0`
48
+ - technical collapse at `T = 1`
49
+ - non-overfiring conversational structure on code
50
+
51
+ If you add a new signal, it must preserve those invariants.
52
+
53
+ ## Calibration Coverage
54
+
55
+ There is not yet a dedicated `gate_calibration_test.go` golden set in the
56
+ repository. Current gating correctness is enforced by the invariant suite in
57
+ [`sidecar/compact/gate_test.go`](../sidecar/compact/gate_test.go).
58
+
59
+ If you introduce new signals or change weighting behavior, do not only update
60
+ the implementation. Add one of:
61
+
62
+ - a new invariant if the change alters a structural property of the gate
63
+ - a dedicated calibration/golden test file if the change adds new labeled
64
+ examples or expected decompositions
65
+
66
+ Do not rewrite expectations just to make regressions disappear.
67
+
68
+ ## PR Expectations
69
+
70
+ Before opening a PR:
71
+
72
+ - `pnpm check` must pass
73
+ - `go test -race ./...` from `sidecar/` must pass
74
+ - any new gating signal must come with calibration or invariant coverage
75
+ - any retrieval math change must be reflected in [mathematics.md](./mathematics.md)
76
+ - any gating change must be reflected in [gating.md](./gating.md)
@@ -0,0 +1,38 @@
1
+ # Dependency Rationale
2
+
3
+ ## LibraVDB over LanceDB
4
+
5
+ LibraVDB was chosen as the vector store because the plugin needs more than a single-table embedding lookup.
6
+
7
+ Key reasons:
8
+
9
+ - collection-level namespacing for:
10
+ - `session:*`
11
+ - `turns:*`
12
+ - `user:*`
13
+ - `global`
14
+ - delete and batch-delete operations used by compaction
15
+ - local-first Go-native operation with no Python bridge or remote service dependency
16
+ - retrieval infrastructure compatible with HNSW and future IVF/PQ-oriented layering
17
+
18
+ LanceDB was the natural alternative. It is a solid choice for straightforward durable vector retrieval, but using it here would still have required additional machinery around:
19
+
20
+ - scope isolation
21
+ - delete-heavy compaction flows
22
+ - local-first lifecycle management around a multi-scope memory design
23
+
24
+ The decision was therefore about operational fit, not abstract preference.
25
+
26
+ ## Slabby
27
+
28
+ The LibraVDB profiling work showed that this workload is allocation-sensitive, especially in repeated insert/search paths over vector-heavy collections.
29
+
30
+ Slab-style raw-vector storage was selected because:
31
+
32
+ - vectors are fixed-size payloads
33
+ - collections grow in bursty append patterns
34
+ - compaction and search create pressure on allocation churn
35
+
36
+ The measured conclusion from the internal profiling pass was that slab-backed raw-vector storage was performance-competitive with the plain in-memory backend while making allocation behavior more predictable. The main trade-off is reserved-but-unused capacity, which is acceptable for this local sidecar workload.
37
+
38
+ The dependency is therefore justified by workload shape, not by novelty.
@@ -0,0 +1,42 @@
1
+ # Embedding Profiles
2
+
3
+ The plugin now supports a lightweight `embeddingProfile` setting for named local model metadata defaults.
4
+
5
+ Default selection baseline as of `2026-03-28`:
6
+
7
+ - default embedding profile: `nomic-embed-text-v1.5`
8
+ - bundled fallback profile: `all-minilm-l6-v2`
9
+
10
+ Why:
11
+
12
+ - MiniLM and Nomic are equivalent on the current lexical and paraphrase baseline.
13
+ - Nomic materially outperforms MiniLM on cross-domain ranking quality.
14
+ - Nomic is the only profile that clears the long-context baseline once sliding-window document embedding is applied.
15
+ - Adversarial lexical traps remain reranker-window cases, but Nomic still narrows the relevant-vs-distractor margin materially.
16
+
17
+ Current shipped profile names:
18
+
19
+ - `all-minilm-l6-v2`
20
+ - family: `all-minilm-l6-v2`
21
+ - dimensions: `384`
22
+ - normalize: `true`
23
+ - max context tokens: `128`
24
+
25
+ - `nomic-embed-text-v1.5`
26
+ - family: `nomic-embed-text-v1.5`
27
+ - dimensions: `768`
28
+ - normalize: `true`
29
+ - max context tokens: `8192`
30
+
31
+ How it works:
32
+
33
+ - `embeddingProfile` supplies metadata defaults like family, dimensions, and normalize behavior.
34
+ - `onnx-local` still requires local model assets through `embeddingModelPath`, typically a directory containing `embedding.json`.
35
+ - The manifest may override or refine the profile, but explicit dimension mismatches fail closed.
36
+ - The sidecar store persists an embedding fingerprint, so reopening an existing store with a different effective model profile will fail instead of silently mixing vector spaces.
37
+
38
+ Recommended usage:
39
+
40
+ - `bundled` for the shipped default path, which now prefers Nomic and falls back to MiniLM if the primary profile is unavailable.
41
+ - `onnx-local` plus `embeddingProfile` when a power user wants a known model family like Nomic with local assets.
42
+ - treat remote/Ollama providers as future separate backend types, not as overloads of `custom-local`.