@xdarkicex/openclaw-memory-libravdb 1.3.9 → 1.3.12

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -2,6 +2,19 @@
2
2
 
3
3
  ## Install
4
4
 
5
+ Recommended on macOS:
6
+
7
+ ```bash
8
+ brew tap xDarkicex/openclaw-libravdb-memory
9
+ brew install libravdbd
10
+ brew services start libravdbd
11
+ openclaw plugins install @xdarkicex/openclaw-memory-libravdb
12
+ ```
13
+
14
+ Then activate the plugin in `~/.openclaw/openclaw.json`.
15
+
16
+ Manual plugin install:
17
+
5
18
  ```bash
6
19
  openclaw plugins install @xdarkicex/openclaw-memory-libravdb
7
20
  ```
@@ -34,6 +47,11 @@ Phase 2 packaging assets now live under [`packaging/`](./packaging):
34
47
  - `packaging/launchd/com.xdarkicex.libravdbd.plist` for macOS LaunchAgents
35
48
  - `packaging/homebrew/libravdbd.rb.tmpl` as the source template for a generated Homebrew formula
36
49
 
50
+ Recommended service startup commands:
51
+
52
+ - macOS: `brew services start libravdbd`
53
+ - Linux: `systemctl --user enable --now libravdbd.service`
54
+
37
55
  ## Activate
38
56
 
39
57
  Add this to `~/.openclaw/openclaw.json`:
package/docs/README.md CHANGED
@@ -1,9 +1,17 @@
1
1
  # Documentation Index
2
2
 
3
+ Versioned `*-v*` design docs are the reviewed authoritative references when a
4
+ legacy non-versioned predecessor also exists. Older non-versioned docs are kept
5
+ to preserve project history and design evolution.
6
+
3
7
  - [installation.md](./installation.md) - Complete install, activation, verification, and troubleshooting reference.
4
8
  - [architecture.md](./architecture.md) - End-to-end component model, turn lifecycle, compaction flow, and degraded behavior.
5
9
  - [problem.md](./problem.md) - Technical argument for replacing the stock OpenClaw memory lifecycle in this use case.
6
- - [mathematics.md](./mathematics.md) - Formal reference for hybrid scoring, decay, token budgeting, Matryoshka retrieval, and compaction.
10
+ - [mathematics-v2.md](./mathematics-v2.md) - Formal reference for hybrid scoring, decay, token budgeting, Matryoshka retrieval, compaction, and planned two-pass retrieval.
11
+ - [compaction-evaluation.md](./compaction-evaluation.md) - Real-model benchmark notes for T5 summary confidence, Nomic-space preservation, and the hard preservation gate.
12
+ - [continuity.md](./continuity.md) - Continuity model for invariant context, preserved recent raw session tail, and retrieved older memory.
13
+ - [ast-v2.md](./ast-v2.md) - Reviewed authoritative AST partitioning reference for authored Markdown hard invariants, soft invariants, and variant lore.
14
+ - [ast.md](./ast.md) - Historical predecessor to `ast-v2.md`, kept to show design evolution and earlier bugs.
7
15
  - [gating.md](./gating.md) - Full derivation and calibration guide for the domain-adaptive gating scalar.
8
16
  - [implementation.md](./implementation.md) - Non-obvious implementation decisions and their rationale.
9
17
  - [dependencies.md](./dependencies.md) - Why LibraVDB and slab-based storage were chosen for this plugin.
package/docs/ast-v2.md ADDED
@@ -0,0 +1,125 @@
1
+ # TITLE: Mathematical Reference - Abstract Syntax Tree (AST) Partitioning
2
+
3
+ This document formalizes the heuristic mapping of user-authored Markdown documents (such as `agents.md` and `souls.md`) into the partitioned sets required by the two-pass retrieval system. It serves as the bridge between raw text ingestion and the rigorous corpus decomposition defined in `mathematics-v2.md`.
4
+
5
+ The design goal is to extract rigid behavioral rules (the invariant sets) from contextual lore (the variant set) automatically. This is achieved using a three-tier structural and semantic proxy, eliminating monolithic injection while protecting user constraints from token-budget starvation.
6
+
7
+ ## 1. The Document AST and Node Extraction
8
+
9
+ Let a raw Markdown document \(d_{\mathrm{raw}}\) be parsed into an Abstract Syntax Tree \(\mathcal{T}\).
10
+ Let \(E: \mathcal{T} \to N_d\) be an extraction function that flattens the tree into an ordered sequence of semantic leaf nodes \(N_d = \langle n_1, n_2, \dots, n_k \rangle\).
11
+
12
+ Each node \(n_i \in N_d\) has an associated structural kind assigned by the parser (e.g., `yuin/goldmark`), mapped by the function \(\kappa: N_d \to K\), where \(K\) is the set of supported Markdown node types:
13
+ \[ K = \{ \text{Paragraph}, \text{List}, \text{Blockquote}, \text{YAMLFrontmatter}, \text{Heading}, \dots \} \]
14
+
15
+ *Implemented in `sidecaragentparser.go`.*
16
+
17
+ ## 2. Formal Deontic Logic (\(\sigma\)) and the Kripke Frame
18
+
19
+ Structural types alone are insufficient proxies for intent. Narrative lore often resides in paragraphs, but authors frequently place critical instructions there as well (e.g., "You must always answer in JSON").
20
+
21
+ To detect these rules without deep NLP allocations, the parser evaluates raw node bytes against a Kripke Frame \((W, R)\) grounded in Standard Deontic Logic (SDL).
22
+
23
+ Let \(\mathcal{B}\) be the set of valid second-person imperative bigrams (e.g., "you must", "never"). A zero-allocation lexer scans the bytes for patterns in \(\mathcal{B}\), mapping them to Modalities (Obligatory, Forbidden, Permitted).
24
+
25
+ To guarantee logical consistency, the engine enforces Seriality (Axiom D). No world reachable from an Obligatory state may contain a Forbidden obligation on the same action:
26
+ \[ O(\phi) \implies \neg F(\text{next}(\phi)) \]
27
+
28
+ We formalize this as a binary promotion scalar \(\sigma: N_d \to \{0,1\}\). This function is specifically targeted at Paragraph nodes, as structural invariants bypass it:
29
+ \[
30
+ \sigma(n) = \begin{cases}
31
+ 1 & \text{if } \kappa(n) = \text{Paragraph} \land \text{SDL}(\mathcal{B}) \text{ detects a valid imperative} \\
32
+ 0 & \text{otherwise}
33
+ \end{cases}
34
+ \]
35
+
36
+ *Implemented via `NewDeonticFrame` and `EvaluateText` in the zero-allocation byte lexer.*
37
+
38
+ ## 3. The Three-Tier Structural Indicator Function \(\iota\)
39
+
40
+ To avoid the brittleness of a binary pass/fail budget, we distribute nodes across a three-tier priority hierarchy.
41
+
42
+ Let \(K_{\mathcal{I}1} \subset K\) be the subset of node kinds that represent hard authorial constraints:
43
+ \[ K_{\mathcal{I}1} = \{ \text{List}, \text{YAMLFrontmatter} \} \]
44
+
45
+ Let \(K_{\mathcal{I}2} \subset K\) be the subset of node kinds that represent soft constraints or stylistic guidelines:
46
+ \[ K_{\mathcal{I}2} = \{ \text{Blockquote} \} \]
47
+
48
+ We define the structural indicator function \(\iota: N_d \to \{0,1,2\}\) mapping each node to a specific tier:
49
+ \[
50
+ \iota(n) = \begin{cases}
51
+ 1 & \text{if } \kappa(n) \in K_{\mathcal{I}1} \quad \text{(Hard Invariant)} \\
52
+ 2 & \text{if } \kappa(n) \in K_{\mathcal{I}2} \lor \sigma(n) = 1 \quad \text{(Soft Invariant)} \\
53
+ 0 & \text{otherwise} \quad \text{(Variant Lore)}
54
+ \end{cases}
55
+ \]
56
+
57
+ *Proof of Reachability:* If a node is a Paragraph, \(\kappa(n) \notin K_{\mathcal{I}1}\) and \(\kappa(n) \notin K_{\mathcal{I}2}\). However, if the deontic lexer detects a rule, \(\sigma(n) = 1\), causing the logical OR condition for \(\iota(n) = 2\) to evaluate to true, successfully promoting the paragraph to a Soft Invariant.
58
+
59
+ ## 4. Corpus Decomposition and Set Integration
60
+
61
+ For any document \(d \in \mathbf{D}_{\text{agents}} \cup \mathbf{D}_{\text{souls}}\), the node set \(N_d\) is partitioned cleanly into three sets:
62
+ - **Hard Directives:** \(\mathcal{I}_{1d} = \{ n \in N_d \mid \iota(n) = 1 \}\)
63
+ - **Soft Directives:** \(\mathcal{I}_{2d} = \{ n \in N_d \mid \iota(n) = 2 \}\)
64
+ - **Contextual Lore:** \(\mathcal{V}_d = \{ n \in N_d \mid \iota(n) = 0 \}\)
65
+
66
+ *Partition Completeness:* Because \(\iota(n)\) maps every node to exactly one integer in \(\{0, 1, 2\}\), the resulting sets are mutually exclusive and collectively exhaustive:
67
+ \[ \mathcal{I}_{1d} \cup \mathcal{I}_{2d} \cup \mathcal{V}_d = N_d \quad \text{and} \quad \mathcal{I}_{1d} \cap \mathcal{I}_{2d} \cap \mathcal{V}_d = \emptyset \]
68
+
69
+ These sets integrate into the global corpus. Let \(\mathbf{D}_{\text{standard}}\) be the set of standard memory documents (non-core files). We formally define the standard variant node set as \(\mathcal{V}_{\text{standard}} = \bigcup_{d \in \mathbf{D}_{\text{standard}}} E(d)\). The global corpus is then:
70
+ \[ \mathcal{I}_1 = \bigcup_{d} \mathcal{I}_{1d} \qquad \mathcal{I}_2 = \bigcup_{d} \mathcal{I}_{2d} \qquad \mathcal{V} = \mathcal{V}_{\text{standard}} \cup \left( \bigcup_{d} \mathcal{V}_d \right) \]
71
+
72
+ By definition, any chunk \(n \in \mathcal{I}_{1d}\) inherits the hard startup injection guarantee from `mathematics-v2.md`. To clarify, \(G(q,n)\) represents the runtime *gating admission scalar*, not semantic relevance.
73
+ \[ \iota(n)=1 \implies G(q,n)=1 \quad \forall q \in \mathbf{Q} \]
74
+
75
+ ## 5. Authored Authority Boost for Variant Lore
76
+
77
+ Chunks in \(\mathcal{V}_d\) lose their invariant injection guarantee and must survive semantic vector retrieval. To ensure that agent-specific lore outcompetes general conversational memory, we enforce a strict authority override. For all \(n \in \mathcal{V}_d\) extracted from a core identity document:
78
+ \[ a_n = 1.0 \]
79
+ This guarantees that variant chunks of core files receive the maximum possible authored weight when scoring against the remaining token budget \(\tau_{\mathcal{V}}\).
80
+
81
+ ## 6. Token Budget Safety Bounds
82
+
83
+ Adversarial or malformed files containing excessively large constraint blocks could violate the strict prompt limits defined by the host. The system enforces split load-time bounds:
84
+
85
+ For Hard Invariants (\(\alpha_1\)):
86
+ \[ \sum_{n \in \mathcal{I}_{1d}} \mathrm{toks}(n) \le \alpha_1 \tau \implies \text{fast-fail and reject agent load if exceeded} \]
87
+
88
+ For Soft Invariants (\(\alpha_2\)):
89
+ \[ \sum_{n \in \mathcal{I}_{2d}} \mathrm{toks}(n) \le \alpha_2 \tau \implies \text{truncate by position if exceeded} \]
90
+
91
+ *Cumulative Verification Proof:* Let the total reserved invariant budget fraction be \(\alpha\), where \(\alpha_1 + \alpha_2 \le \alpha\). If both independent enforcement bounds are satisfied, then:
92
+ \[ \sum_{n \in \mathcal{I}_{1d}} \mathrm{toks}(n) + \sum_{n \in \mathcal{I}_{2d}} \mathrm{toks}(n) \le \alpha_1 \tau + \alpha_2 \tau = (\alpha_1 + \alpha_2)\tau \le \alpha \tau \]
93
+ This mathematically guarantees the overall token budget \(\tau\) is never breached by the combined invariant sets.
94
+
95
+ Under the unified assembly contract in [`mathematics-v2.md`](./mathematics-v2.md)
96
+ section 7.8 and [`continuity.md`](./continuity.md), these authored bounds are
97
+ combined with a separate recent-tail target fraction \(\beta\). The runtime
98
+ therefore treats the tiers with the following precedence:
99
+
100
+ 1. **Tier 1 / Hard invariants** must fit their startup reservation \(\alpha_1\tau\).
101
+ 2. **Mandatory recent tail** must preserve at least the minimum raw suffix \(T_{\mathrm{base}}\).
102
+ 3. **Tier 2 / Soft invariants** are injected by longest-prefix truncation under the effective budget
103
+ \[
104
+ \tau_{\mathcal{I}_2}^{\mathrm{eff}}=
105
+ \min\!\left(\alpha_2\tau,\,
106
+ \tau-\tau_{\mathcal{I}_1}-\mathrm{toks}(T_{\mathrm{base}})\right)
107
+ \]
108
+ 4. **Variant lore** competes only for the final residual budget after Tier 1,
109
+ the admitted Tier 2 prefix, and the exact recent tail are accounted for.
110
+
111
+ This makes \(\mathcal{I}_1\) and the minimum continuity suffix hard
112
+ constraints, while keeping \(\mathcal{I}_2\) order-preserving but elastic.
113
+
114
+ ## 7. The Document-Addressed Cache (\(\Psi\)) and Runtime Implications
115
+
116
+ The AST extraction, Deontic bigram evaluation, and partition logic are purely deterministic functions of \(d_{\mathrm{raw}}\). To prevent \(O(N)\) recomputation on every conversational turn, the system maintains a document-addressed cache:
117
+
118
+ \[ \Psi: \text{hash}(d_{\mathrm{raw}}, \text{tokenizer\_id}) \to \{\mathcal{I}_{1d}, \mathcal{I}_{2d}, \mathcal{V}_d, \text{budget}\} \]
119
+
120
+ Because the token estimator function \(\lceil \frac{|t|}{\chi(t)} \rceil\) depends on the active model tokenizer, \(\text{tokenizer\_id}\) is embedded in the hash key.
121
+
122
+ At runtime:
123
+ 1. **Tier 1 (\(\mathcal{I}_{1d}\))** is injected via an \(O(1)\) memory copy.
124
+ 2. **Tier 2 (\(\mathcal{I}_{2d}\))** is evaluated via an \(O(|\mathcal{I}_{2d}|)\) prefix sum to enforce position truncation under \(\tau_{\mathcal{I}_2}^{\mathrm{eff}}\).
125
+ 3. **Tier 0 (\(\mathcal{V}_d\))** bypasses re-parsing and feeds into the semantic Pass 1 vector retrieval only after the continuity layer removes the exact recent tail into \(T_{\mathrm{recent}}\), leaving \(\mathcal{V}_{\mathrm{rest}}\).
package/docs/ast.md ADDED
@@ -0,0 +1,70 @@
1
+ # TITLE: Mathematical Reference - Abstract Syntax Tree (AST) Partitioning
2
+
3
+ Historical note: this document is preserved to show the project's design
4
+ evolution. The reviewed authoritative AST reference is
5
+ [`ast-v2.md`](./ast-v2.md).
6
+
7
+ This document formalizes the heuristic mapping of user-authored Markdown documents (such as `agents.md` and `souls.md`) into the partitioned sets required by the two-pass retrieval system. It serves as the bridge between raw text ingestion and the rigorous corpus decomposition defined in `mathematics-v2.md` Section 7.2.
8
+
9
+ The design goal is to extract rigid behavioral rules (the invariant set) from contextual lore (the variant set) automatically, using structural types as a mathematically stable proxy for user intent.
10
+
11
+ ## 1. The Document AST and Node Extraction
12
+
13
+ Let a raw Markdown document $d_{\mathrm{raw}}$ be parsed into an Abstract Syntax Tree $\mathcal{T}$.
14
+ Let $E: \mathcal{T} \to N_d$ be an extraction function that flattens the tree into an ordered sequence of semantic leaf nodes $N_d = \langle n_1, n_2, \dots, n_k \rangle$.
15
+
16
+ Each node $n_i \in N_d$ has an associated structural kind assigned by the parser (e.g., `yuin/goldmark`), mapped by the function $\kappa: N_d \to K$, where $K$ is the set of supported Markdown node types:
17
+ \[ K = \{ \text{Paragraph}, \text{List}, \text{Blockquote}, \text{YAMLFrontmatter}, \text{Heading}, \dots \} \]
18
+
19
+ *Implemented in `sidecaragentparser.go`.*
20
+
21
+ ## 2. The Structural Indicator Function $\iota$
22
+
23
+ To avoid document-level monolithic injection, we redefine the invariant membership predicate from `mathematics-v2.md` Section 7.2 at the node level.
24
+
25
+ Let $K_{\mathcal{I}} \subset K$ be the subset of node kinds structurally correlated with hard constraints, core directives, and programmatic definitions:
26
+ \[ K_{\mathcal{I}} = \{ \text{List}, \text{Blockquote}, \text{YAMLFrontmatter} \} \]
27
+
28
+ We define the structural indicator function $\iota: N_d \to \{0,1\}$ as:
29
+ \[
30
+ \iota(n) = \begin{cases}
31
+ 1 & \text{if } \kappa(n) \in K_{\mathcal{I}} \\
32
+ 0 & \text{otherwise}
33
+ \end{cases}
34
+ \]
35
+
36
+ **Note on structural proxy limits:** This heuristic relies entirely on the probability that human authors place absolute rules in lists/frontmatter and narrative lore in standard paragraphs. It is mathematically blind to the semantic meaning of the text.
37
+
38
+ ## 3. Corpus Decomposition and Set Integration
39
+
40
+ For any document $d \in \mathbf{D}_{\text{agents}} \cup \mathbf{D}_{\text{souls}}$, the node set $N_d$ is partitioned cleanly:
41
+ - **The Core Directives (Invariant):** $\mathcal{I}_d = \{ n \in N_d \mid \iota(n) = 1 \}$
42
+ - **The Contextual Lore (Variant):** $\mathcal{V}_d = \{ n \in N_d \mid \iota(n) = 0 \}$
43
+
44
+ This guarantees partition integrity:
45
+ \[ \mathcal{I}_d \cup \mathcal{V}_d = N_d \quad \text{and} \quad \mathcal{I}_d \cap \mathcal{V}_d = \emptyset \]
46
+
47
+ These sets feed directly into the global corpus partitioning:
48
+ \[ \mathcal{I} = \bigcup_{d} \mathcal{I}_d \qquad \mathcal{V} = \mathbf{D}_{\text{standard}} \cup \left( \bigcup_{d} \mathcal{V}_d \right) \]
49
+
50
+ By definition, any chunk $n \in \mathcal{I}_d$ inherits the hard startup guarantee from `mathematics-v2.md` Section 7.1:
51
+ \[ \iota(n)=1 \implies G(q,n)=1 \quad \forall q \in \mathbf{Q} \]
52
+
53
+ ## 4. Authored Authority Boost for Variant Lore
54
+
55
+ Chunks in $\mathcal{V}_d$ (such as standard paragraph nodes) lose their invariant guarantee and must survive the Pass 1 coarse semantic filter defined in `mathematics-v2.md` Section 7.4.
56
+
57
+ To ensure that agent-specific lore outcompetes general conversational memory during Pass 2, we enforce a strict authority override. For all $n \in \mathcal{V}_d$ extracted from a core identity document:
58
+ \[ a_n = 1.0 \]
59
+
60
+ Following the authority weight convex combination $d_{\omega}$ from `mathematics-v2.md` Section 7.3, this guarantees that variant chunks of core files receive the maximum possible authored weight when scoring against the remaining token budget $\tau_{\mathcal{V}}$.
61
+
62
+ ## 5. Token Budget Safety Bounds
63
+
64
+ Because invariants bypass all truncation (Section 7.8), an adversarial or malformed file containing an excessively large list block could violate the token budget:
65
+ \[ \sum_{n \in \mathcal{I}_d} \mathrm{toks}(n) > \tau \]
66
+
67
+ Therefore, the system must enforce a load-time safety bound on the extracted AST invariants:
68
+ \[ \tau_{\text{max\_invariant}} \le \alpha \tau \quad \text{where } \alpha \in (0, 1) \]
69
+
70
+ If parsing yields an $\mathcal{I}_d$ that exceeds $\alpha \tau$ (e.g., $\alpha = 0.4$, reserving 60% of context for variant history and tools), the parser must fast-fail and reject the agent load. This protects the runtime invariants dictated in `mathematics-v2.md` Section 7.10 from mathematically impossible token fits.
@@ -0,0 +1,182 @@
1
+ # Compaction Evaluation
2
+
3
+ This document records the first local evaluation pass for the Nomic-first
4
+ compaction confidence design.
5
+
6
+ The goal of the experiment was to compare:
7
+
8
+ - raw ONNX T5 decoder confidence
9
+ - Nomic-space preservation metrics
10
+ - the planned hybrid confidence model with a hard preservation gate
11
+
12
+ The evaluation harness lives in:
13
+
14
+ - `sidecar/cmd/eval_compaction`
15
+
16
+ It runs real local models:
17
+
18
+ - Nomic `nomic-embed-text-v1.5` for embedding-space evaluation
19
+ - ONNX T5-small for optional abstractive summarization
20
+
21
+ ## Why This Exists
22
+
23
+ The compaction system previously trusted T5 decoder confidence alone:
24
+
25
+ ```text
26
+ conf_t5(s, C) = exp(mean log p(token_i | token_<i, C))
27
+ ```
28
+
29
+ That quantity measures decoder self-consistency, not semantic preservation in
30
+ the retrieval geometry used by the vector store.
31
+
32
+ The new design evaluates every summary back in Nomic space:
33
+
34
+ ```text
35
+ Q_align(s, C) = cos(E(s), mu_C)
36
+ Q_cover(s, C) = mean_i max(0, cos(E(s), E(t_i)))
37
+ conf_nomic(s, C) = clamp01((Q_align + Q_cover) / 2)
38
+ ```
39
+
40
+ And then applies:
41
+
42
+ ```text
43
+ if Q_align < tau_preserve:
44
+ reject abstractive summary and fall back to extractive
45
+
46
+ confidence =
47
+ conf_nomic for extractive
48
+ lambda * conf_nomic + (1 - lambda) * conf_t5 for T5 summaries
49
+ ```
50
+
51
+ with the current implementation constants:
52
+
53
+ - `tau_preserve = 0.65`
54
+ - `lambda = 0.8`
55
+
56
+ ## Baseline Corpus
57
+
58
+ The current real-model pass uses 17 fixed synthetic clusters:
59
+
60
+ - 5 normal engineering-memory clusters
61
+ - 12 adversarial clusters designed to stress abstractive faithfulness
62
+
63
+ The adversarial set included:
64
+
65
+ - conflicting subsystem failures
66
+ - dense Go code and test logic
67
+ - four-way architectural decision bundles
68
+ - many-number and threshold-heavy cases
69
+ - continuity vs progress tension
70
+ - cross-domain product/math/infra mixtures
71
+ - token-budget contract distinctions
72
+ - conflicting proposed resolutions vs the actual root cause
73
+ - long noisy code-trace clusters with one decisive invariant
74
+ - topic-shift clusters that tempt generic summaries
75
+ - near-duplicate threshold statements from different subsystems
76
+
77
+ ## Results
78
+
79
+ ### Core Cases
80
+
81
+ | case | raw_conf | align | cover | final_conf | delta_conf |
82
+ |---|---:|---:|---:|---:|---:|
83
+ | auth_migration | 0.8501 | 0.9183 | 0.8342 | 0.8710 | +0.0209 |
84
+ | compaction_boundary | 0.6894 | 0.7983 | 0.7216 | 0.7458 | +0.0564 |
85
+ | gating_math | 0.7790 | 0.9167 | 0.8285 | 0.8539 | +0.0748 |
86
+ | release_pipeline | 0.8859 | 0.9697 | 0.8729 | 0.9142 | +0.0283 |
87
+ | adversarial_multi_fact | 0.8545 | 0.9052 | 0.7893 | 0.8487 | -0.0058 |
88
+
89
+ ### Adversarial Cases
90
+
91
+ | case | raw_conf | align | cover | final_conf | delta_conf |
92
+ |---|---:|---:|---:|---:|---:|
93
+ | adversarial_conflicting_errors | 0.8540 | 0.8579 | 0.7440 | 0.8116 | -0.0424 |
94
+ | adversarial_dense_go_code | 0.8945 | 0.9167 | 0.8212 | 0.8741 | -0.0205 |
95
+ | adversarial_four_way_decision_bundle | 0.8451 | 0.8651 | 0.7598 | 0.8190 | -0.0261 |
96
+ | adversarial_many_numbers | 0.6915 | 0.8854 | 0.7900 | 0.8084 | +0.1170 |
97
+ | adversarial_boundary_vs_progress | 0.7824 | 0.8993 | 0.8109 | 0.8406 | +0.0581 |
98
+ | adversarial_cross_domain_mix | 0.5240 | 0.8099 | 0.7327 | 0.7218 | +0.1978 |
99
+ | adversarial_token_budget_rules | 0.7938 | 0.9060 | 0.8249 | 0.8511 | +0.0573 |
100
+ | adversarial_conflicting_resolutions | 0.8600 | 0.9284 | 0.8560 | 0.8858 | +0.0258 |
101
+ | adversarial_long_noisy_code_trace | 0.8144 | 0.8565 | 0.7893 | 0.8212 | +0.0068 |
102
+ | adversarial_topic_shift_generic_bait | 0.8860 | 0.9166 | 0.8209 | 0.8722 | -0.0138 |
103
+ | adversarial_near_duplicate_thresholds | 0.8731 | 0.9123 | 0.8266 | 0.8702 | -0.0029 |
104
+
105
+ ## What We Learned
106
+
107
+ ### 1. T5 and Nomic are locally compatible
108
+
109
+ Every evaluated case produced:
110
+
111
+ ```text
112
+ Q_align > 0.65
113
+ ```
114
+
115
+ So the hard preservation gate did not trigger on the initial corpus. This is
116
+ useful evidence that the local T5 summaries are generally pointing in the same
117
+ semantic direction as the source cluster in Nomic space.
118
+
119
+ ### 2. The new math improves confidence grounding
120
+
121
+ The hybrid model changed confidence more often than it changed summary text.
122
+
123
+ This is still a meaningful result:
124
+
125
+ - positive deltas mean Nomic-space preservation validated summaries that T5
126
+ scored pessimistically
127
+ - negative deltas mean Nomic-space preservation penalized summaries that T5
128
+ scored too generously
129
+
130
+ The largest rescue was:
131
+
132
+ - `adversarial_cross_domain_mix`: `0.5240 -> 0.7218` (`+0.1978`)
133
+
134
+ The largest penalty was:
135
+
136
+ - `adversarial_conflicting_errors`: `0.8540 -> 0.8116` (`-0.0424`)
137
+
138
+ So even without fallback, the confidence signal is more retrieval-aware than the
139
+ old T5-only design.
140
+
141
+ ### 3. Harsher corpus plus threshold sweep sharpened the evidence
142
+
143
+ Even after expanding the corpus to 17 cases, the shipped gate still did not
144
+ trip:
145
+
146
+ ```text
147
+ tau_preserve = 0.65 -> 0 trips
148
+ tau_preserve = 0.75 -> 0 trips
149
+ tau_preserve = 0.85 -> 2 trips
150
+ ```
151
+
152
+ The two cases that fall below `0.85` are:
153
+
154
+ - `compaction_boundary`
155
+ - `adversarial_cross_domain_mix`
156
+
157
+ So the section-5 preservation machinery is now evidenced in two ways:
158
+
159
+ - unit tests prove the hard fallback path when `Q_align < tau_preserve`
160
+ - real-model threshold sweeps show where the current corpus begins to stress
161
+ geometric drift, even though the shipped `0.65` threshold remains conservative
162
+
163
+ This means the earlier evidence gap has narrowed: the corpus is now harsh enough
164
+ to differentiate thresholds and expose weaker cases, even if it still does not
165
+ force fallback at the default gate.
166
+
167
+ Remaining interpretation questions are now about calibration, not about whether
168
+ the gate machinery exists or whether the evaluation corpus can separate stronger
169
+ and weaker summaries.
170
+
171
+ ## Current Interpretation
172
+
173
+ The preservation gate is not decorative, but its first practical value is
174
+ confidence correction rather than frequent fallback.
175
+
176
+ That is still a win:
177
+
178
+ - T5 remains the lightweight local decoder
179
+ - Nomic remains the canonical retrieval geometry
180
+ - compaction confidence is now judged in the same space retrieval uses
181
+
182
+ This is the mathematically coherent compromise for a stable shippable plugin.