@xdarkicex/openclaw-memory-libravdb 1.3.19 → 1.3.21
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +1 -1
- package/docs/README.md +1 -1
- package/docs/architecture.md +8 -14
- package/docs/implementation.md +2 -2
- package/docs/mathematics-v2.md +485 -0
- package/package.json +1 -1
- package/src/context-engine.ts +50 -7
- package/src/memory-provider.ts +19 -81
- package/src/openclaw-plugin-sdk.d.ts +6 -1
- package/src/scoring.ts +93 -1
- package/src/sidecar.ts +31 -1
- package/src/temporal.ts +385 -0
- package/src/tokens.ts +16 -0
- package/src/types.ts +9 -0
package/README.md
CHANGED
|
@@ -131,7 +131,7 @@ If your daemon runs elsewhere, set an explicit `sidecarPath`, for example:
|
|
|
131
131
|
|
|
132
132
|
```text
|
|
133
133
|
OpenClaw host
|
|
134
|
-
-> memoryPromptSection (
|
|
134
|
+
-> memoryPromptSection (static capability header)
|
|
135
135
|
-> memory runtime bridge (built-in memory_search)
|
|
136
136
|
-> context engine (bootstrap / ingest / assemble / compact)
|
|
137
137
|
-> plugin runtime
|
package/docs/README.md
CHANGED
|
@@ -9,7 +9,7 @@ to preserve project history and design evolution.
|
|
|
9
9
|
- [uninstall.md](./uninstall.md) - Clean shutdown and removal guide for the plugin, daemon, and optional local data.
|
|
10
10
|
- [architecture.md](./architecture.md) - End-to-end component model, turn lifecycle, compaction flow, and degraded behavior.
|
|
11
11
|
- [problem.md](./problem.md) - Technical argument for replacing the stock OpenClaw memory lifecycle in this use case.
|
|
12
|
-
- [mathematics-v2.md](./mathematics-v2.md) - Formal reference for hybrid scoring, decay, token budgeting, Matryoshka retrieval, compaction,
|
|
12
|
+
- [mathematics-v2.md](./mathematics-v2.md) - Formal reference for hybrid scoring, decay, token budgeting, Matryoshka retrieval, compaction, planned two-pass retrieval, and temporal-compositional projection.
|
|
13
13
|
- [compaction-evaluation.md](./compaction-evaluation.md) - Real-model benchmark notes for T5 summary confidence, Nomic-space preservation, and the hard preservation gate.
|
|
14
14
|
- [continuity.md](./continuity.md) - Continuity model for invariant context, preserved recent raw session tail, and retrieved older memory.
|
|
15
15
|
- [ast-v2.md](./ast-v2.md) - Reviewed authoritative AST partitioning reference for authored Markdown hard invariants, soft invariants, and variant lore.
|
package/docs/architecture.md
CHANGED
|
@@ -10,7 +10,7 @@ repository as of the current `main` branch.
|
|
|
10
10
|
flowchart LR
|
|
11
11
|
Host["OpenClaw host process\n(TypeScript plugin shell)"]
|
|
12
12
|
CE["Context engine factory\nbootstrap / ingest / assemble / compact"]
|
|
13
|
-
MPS["memoryPromptSection\
|
|
13
|
+
MPS["memoryPromptSection\nstatic header"]
|
|
14
14
|
Runtime["Plugin runtime\nlazy daemon connect + RPC client"]
|
|
15
15
|
Sidecar["Go daemon process"]
|
|
16
16
|
RPC["JSON-RPC over newline-delimited frames\nUnix socket or TCP loopback on Windows"]
|
|
@@ -28,7 +28,6 @@ flowchart LR
|
|
|
28
28
|
Host --> CE
|
|
29
29
|
Host --> MPS
|
|
30
30
|
CE --> Runtime
|
|
31
|
-
MPS --> Runtime
|
|
32
31
|
Runtime --> RPC
|
|
33
32
|
RPC --> Sidecar
|
|
34
33
|
Sidecar --> Embed
|
|
@@ -80,17 +79,12 @@ Important constraints from the current implementation:
|
|
|
80
79
|
|
|
81
80
|
Implemented in [`src/memory-provider.ts`](../src/memory-provider.ts).
|
|
82
81
|
|
|
83
|
-
Before the main assembly path runs, the plugin
|
|
84
|
-
|
|
82
|
+
Before the main assembly path runs, the plugin returns a lightweight static
|
|
83
|
+
header fragment that tells the host persistent memory is active.
|
|
85
84
|
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
4. fit them to a fixed prompt budget of `800` estimated tokens
|
|
90
|
-
5. return a textual header fragment for the host prompt
|
|
91
|
-
|
|
92
|
-
This path does not search session memory. Its job is durable context recall, not
|
|
93
|
-
active-turn recall.
|
|
85
|
+
This path is intentionally synchronous and does not perform RPC retrieval.
|
|
86
|
+
Durable recall now happens entirely inside `assemble`, which keeps embedded
|
|
87
|
+
prompt construction compatible with OpenClaw's synchronous memory prompt hook.
|
|
94
88
|
|
|
95
89
|
### 2.3 `assemble`
|
|
96
90
|
|
|
@@ -107,7 +101,7 @@ For the current query text (last message content), the host:
|
|
|
107
101
|
|
|
108
102
|
Current implementation details that matter:
|
|
109
103
|
|
|
110
|
-
- user/global hits
|
|
104
|
+
- user/global hits are cached within `assemble` and reused on repeated queries
|
|
111
105
|
- `assemble` falls back to the unmodified message list on RPC failure
|
|
112
106
|
- `assemble` does not mutate the original `messages` array in place; it returns
|
|
113
107
|
a new array
|
|
@@ -146,7 +140,7 @@ from the original spec phrasing.
|
|
|
146
140
|
|---|---|---|
|
|
147
141
|
| Daemon unavailable on first RPC use | `getRpc()` rejects when first connect or health check fails | That hook fails or falls back, but plugin registration itself does not crash eagerly |
|
|
148
142
|
| Daemon connection closes mid-session | `SidecarSupervisor` retries with exponential backoff until retry budget is exhausted, then enters degraded mode | Memory becomes unavailable until the daemon is reachable again |
|
|
149
|
-
| `memoryPromptSection`
|
|
143
|
+
| `memoryPromptSection` failure | returns a static header with no RPC dependency | Prompt section stays available and does not block the run |
|
|
150
144
|
| `assemble` RPC failure | returns original messages, original token count, and empty `systemPromptAddition` | That turn gets no recall augmentation |
|
|
151
145
|
| `ingest` gating or durable insert failure | session write already happened; durable promotion is skipped | Session memory survives, durable memory may miss that turn |
|
|
152
146
|
| Compaction summarizer unavailable | extractive summarizer remains required; optional abstractive path is skipped | Compaction still runs extractively when extractive is healthy |
|
package/docs/implementation.md
CHANGED
|
@@ -18,8 +18,8 @@ Why:
|
|
|
18
18
|
- `ingest`
|
|
19
19
|
- `assemble`
|
|
20
20
|
- `compact`
|
|
21
|
-
- the lightweight memory prompt section remains useful as a
|
|
22
|
-
durable
|
|
21
|
+
- the lightweight memory prompt section remains useful as a synchronous
|
|
22
|
+
capability/header hook while durable recall stays in `assemble`
|
|
23
23
|
|
|
24
24
|
This is why the code registers both `registerContextEngine("libravdb-memory", …)`
|
|
25
25
|
and `registerMemoryPromptSection(...)` instead of relying on only one hook.
|
package/docs/mathematics-v2.md
CHANGED
|
@@ -1386,3 +1386,488 @@ retaining for future work:
|
|
|
1386
1386
|
These ideas are intentionally preserved as future mathematics rather than
|
|
1387
1387
|
current contract. The present document remains normative only for the formulas
|
|
1388
1388
|
and invariants already defined above.
|
|
1389
|
+
|
|
1390
|
+
## 9. Temporal-Compositional Retrieval Extension
|
|
1391
|
+
|
|
1392
|
+
This section defines a narrow, mathematically principled extension to the
|
|
1393
|
+
$\mathrm{Proj}()$ operator that corrects the single-turn-centric failure mode on
|
|
1394
|
+
temporal-compositional queries such as "how many days before $X$ did $Y$
|
|
1395
|
+
happen."
|
|
1396
|
+
|
|
1397
|
+
The extension is self-contained. Every formula in this section is bounded and
|
|
1398
|
+
correct under the existing parameter domains. The assembly law
|
|
1399
|
+
$C_{\mathrm{total}}(q)$, the budget hierarchy, and the runtime invariants in
|
|
1400
|
+
Section 7.10 and [`continuity.md`](./continuity.md) are unchanged. Only the
|
|
1401
|
+
internal definition of $\mathrm{Proj}(\mathcal{V}_{\mathrm{rest}}, q)$ is
|
|
1402
|
+
refined.
|
|
1403
|
+
|
|
1404
|
+
Implemented in: `src/temporal.ts` (planned).
|
|
1405
|
+
|
|
1406
|
+
### 9.1 Motivation: The Set-Scoring Gap
|
|
1407
|
+
|
|
1408
|
+
The standard Pass-2 score $S_{\mathrm{final}}(d)$ maximizes over individual
|
|
1409
|
+
candidates:
|
|
1410
|
+
|
|
1411
|
+
$$
|
|
1412
|
+
\mathcal{C}_2(q)
|
|
1413
|
+
=
|
|
1414
|
+
\mathrm{TopK}_{d \in \mathcal{C}_1(q)}
|
|
1415
|
+
\left(k_2,\, S_{\mathrm{final}}(d)\right)
|
|
1416
|
+
$$
|
|
1417
|
+
|
|
1418
|
+
This is optimal when the query is answerable from a single best document. It
|
|
1419
|
+
fails when the query requires two complementary date-bearing turns to be
|
|
1420
|
+
jointly present, neither of which is individually the best semantic match.
|
|
1421
|
+
|
|
1422
|
+
The failure pattern is:
|
|
1423
|
+
|
|
1424
|
+
- Turn $A$ covers the query topic broadly, so it earns a high
|
|
1425
|
+
$S_{\mathrm{final}}$ and wins alone.
|
|
1426
|
+
- Turn $B$ contains the missing date anchor, but earns only a moderate
|
|
1427
|
+
$S_{\mathrm{final}}$ and is evicted.
|
|
1428
|
+
- Neither $A$ alone nor $B$ alone answers the question.
|
|
1429
|
+
|
|
1430
|
+
The fix is to move from
|
|
1431
|
+
$\underset{d}{\arg\max}\; S_{\mathrm{final}}(d)$ to a coverage-aware set
|
|
1432
|
+
selector that rewards a set of candidates for jointly maximizing semantic
|
|
1433
|
+
relevance, temporal anchor density, and event-slot coverage while penalizing
|
|
1434
|
+
redundancy automatically via marginal scoring.
|
|
1435
|
+
|
|
1436
|
+
### 9.2 Temporal Query Indicator $\xi(q)\in[0,1]$
|
|
1437
|
+
|
|
1438
|
+
To avoid mutating the retrieval contract for normal queries, the extension
|
|
1439
|
+
activates only when the query is detected to be temporal-compositional.
|
|
1440
|
+
Define the temporal query indicator using the same saturating-sum pattern as
|
|
1441
|
+
$T(t)$ in [`gating.md`](./gating.md):
|
|
1442
|
+
|
|
1443
|
+
$$
|
|
1444
|
+
\xi(q)
|
|
1445
|
+
=
|
|
1446
|
+
\min\!\left(
|
|
1447
|
+
\frac{\displaystyle\sum_i s_i \cdot \mathbf{1}[\mathrm{tpat}_i(q)]}
|
|
1448
|
+
{\theta_{\xi}^{\mathrm{norm}}},
|
|
1449
|
+
1
|
|
1450
|
+
\right)
|
|
1451
|
+
$$
|
|
1452
|
+
|
|
1453
|
+
where the shipped temporal patterns $\mathrm{tpat}_i$ are zero-allocation
|
|
1454
|
+
byte-lexer matches over the query text, including but not limited to
|
|
1455
|
+
"how many days", "how long", "before", "after", "since", "first", "earlier",
|
|
1456
|
+
"which came first", "when did", and "between".
|
|
1457
|
+
|
|
1458
|
+
Each pattern carries a weight $s_i > 0$. The default normalization constant is
|
|
1459
|
+
$\theta_{\xi}^{\mathrm{norm}} = 1.5$, so two strong temporal signals saturate
|
|
1460
|
+
$\xi(q)=1$.
|
|
1461
|
+
|
|
1462
|
+
By construction, the $\min(\cdot, 1)$ clamp and non-negative numerator
|
|
1463
|
+
guarantee:
|
|
1464
|
+
|
|
1465
|
+
$$
|
|
1466
|
+
\xi(q)\in[0,1]
|
|
1467
|
+
$$
|
|
1468
|
+
|
|
1469
|
+
If no temporal patterns match, $\xi(q)=0$ and the extension contributes
|
|
1470
|
+
nothing to the scoring formula.
|
|
1471
|
+
|
|
1472
|
+
The extension activates only when $\xi(q)\ge\theta_\xi$, with shipped default
|
|
1473
|
+
$\theta_\xi = 0.3$. Below that threshold, the standard $\mathrm{Proj}$ path
|
|
1474
|
+
executes without modification.
|
|
1475
|
+
|
|
1476
|
+
### 9.3 Temporal Anchor Density $A(d)\in[0,1]$
|
|
1477
|
+
|
|
1478
|
+
A document's temporal anchor density measures how many explicit date or time
|
|
1479
|
+
expressions it contains, normalized by a bounded saturation constant.
|
|
1480
|
+
Define the anchor count over a lightweight anchor pattern set $\mathcal{P}_A$
|
|
1481
|
+
(ISO dates, relative day expressions, clock times, calendar words, Unix
|
|
1482
|
+
timestamps):
|
|
1483
|
+
|
|
1484
|
+
$$
|
|
1485
|
+
A(d)
|
|
1486
|
+
=
|
|
1487
|
+
\min\!\left(
|
|
1488
|
+
\frac{\displaystyle\sum_j \mathbf{1}[\mathrm{anch}_j(d)]}
|
|
1489
|
+
{\theta_A^{\mathrm{norm}}},
|
|
1490
|
+
1
|
|
1491
|
+
\right)
|
|
1492
|
+
$$
|
|
1493
|
+
|
|
1494
|
+
The default $\theta_A^{\mathrm{norm}} = 3$, so three or more distinct anchor
|
|
1495
|
+
expressions saturate $A(d)=1$.
|
|
1496
|
+
|
|
1497
|
+
Again, the clamp guarantees:
|
|
1498
|
+
|
|
1499
|
+
$$
|
|
1500
|
+
A(d)\in[0,1]
|
|
1501
|
+
$$
|
|
1502
|
+
|
|
1503
|
+
$A(d)$ is a precomputed document-level scalar. It does not depend on the query
|
|
1504
|
+
and should be cached in the same document-addressed cache $\Psi$ defined in
|
|
1505
|
+
[`ast-v2.md`](./ast-v2.md) Section 7 alongside tier partition and budget
|
|
1506
|
+
metadata. The value must be recomputed whenever a stored document is created,
|
|
1507
|
+
updated, or regenerated by compaction.
|
|
1508
|
+
|
|
1509
|
+
### 9.4 Event-Slot Extraction and Marginal Coverage $\Delta\Phi$
|
|
1510
|
+
|
|
1511
|
+
#### 9.4.1 Event-Slot Extraction
|
|
1512
|
+
|
|
1513
|
+
For a temporal-compositional query $q$, define the event-slot set:
|
|
1514
|
+
|
|
1515
|
+
$$
|
|
1516
|
+
E(q)=\langle e_1, e_2, \dots, e_m \rangle
|
|
1517
|
+
$$
|
|
1518
|
+
|
|
1519
|
+
where each $e_j$ is a short noun-phrase span extracted from $q$ by a
|
|
1520
|
+
lightweight span extractor: named entities plus the main noun phrase preceding
|
|
1521
|
+
and following any detected temporal-pattern word. The extractor returns at
|
|
1522
|
+
most $m_{\max}=4$ slots to bound cost.
|
|
1523
|
+
|
|
1524
|
+
When $|E(q)|=0$, all coverage terms evaluate to zero and the formula degrades
|
|
1525
|
+
cleanly.
|
|
1526
|
+
|
|
1527
|
+
#### 9.4.2 Per-Slot Coverage Indicator
|
|
1528
|
+
|
|
1529
|
+
For each slot $e_j$ and candidate document $d$, define the binary slot-match
|
|
1530
|
+
indicator:
|
|
1531
|
+
|
|
1532
|
+
$$
|
|
1533
|
+
\phi_j(d)
|
|
1534
|
+
=
|
|
1535
|
+
\mathbf{1}\!\left[\varphi(e_j)^\top \varphi(d) \ge \theta_e\right]
|
|
1536
|
+
\in \{0,1\}
|
|
1537
|
+
$$
|
|
1538
|
+
|
|
1539
|
+
where $\varphi(\cdot)$ is the same unit-normalized embedding function defined
|
|
1540
|
+
in Section 7.1, and $\theta_e \in [-1,1]$ is the slot-match similarity
|
|
1541
|
+
threshold, default $\theta_e = 0.50$.
|
|
1542
|
+
|
|
1543
|
+
#### 9.4.3 Marginal Coverage
|
|
1544
|
+
|
|
1545
|
+
For a set $\mathcal{S}$ of already-selected documents, define the marginal
|
|
1546
|
+
coverage of adding $d$:
|
|
1547
|
+
|
|
1548
|
+
$$
|
|
1549
|
+
\Delta\Phi(d, \mathcal{S}, q)
|
|
1550
|
+
=
|
|
1551
|
+
\frac{1}{\max(|E(q)|, 1)}
|
|
1552
|
+
\sum_{j=1}^{|E(q)|}
|
|
1553
|
+
\phi_j(d)
|
|
1554
|
+
\cdot
|
|
1555
|
+
\mathbf{1}\!\left[\nexists d' \in \mathcal{S} : \phi_j(d') = 1\right]
|
|
1556
|
+
$$
|
|
1557
|
+
|
|
1558
|
+
This is the fraction of uncovered event slots that $d$ newly covers.
|
|
1559
|
+
|
|
1560
|
+
The outer factor is in $(0,1]$, the sum counts at most $|E(q)|$ binary terms,
|
|
1561
|
+
and therefore:
|
|
1562
|
+
|
|
1563
|
+
$$
|
|
1564
|
+
\Delta\Phi(d, \mathcal{S}, q)\in[0,1]
|
|
1565
|
+
$$
|
|
1566
|
+
|
|
1567
|
+
The indicator
|
|
1568
|
+
$\mathbf{1}\!\left[\nexists d' \in \mathcal{S} : \phi_j(d') = 1\right]$
|
|
1569
|
+
ensures that slots already covered by a previously selected document
|
|
1570
|
+
contribute zero marginal gain, automatically penalizing redundant anchor turns
|
|
1571
|
+
without a separate explicit penalty term.
|
|
1572
|
+
|
|
1573
|
+
As $|\mathcal{S}|$ grows, $\Delta\Phi(d,\mathcal{S},q)$ is monotone
|
|
1574
|
+
non-increasing: new selections can only cover more slots, leaving fewer
|
|
1575
|
+
uncovered slots for later candidates to gain credit for.
|
|
1576
|
+
|
|
1577
|
+
### 9.5 Coverage-Augmented Blended Score
|
|
1578
|
+
$S_{\mathrm{proj}}(d,\mathcal{S},q)\in[0,1]$
|
|
1579
|
+
|
|
1580
|
+
Define the coverage-augmented score for candidate $d$ given already-selected
|
|
1581
|
+
set $\mathcal{S}$ and query $q$:
|
|
1582
|
+
|
|
1583
|
+
$$
|
|
1584
|
+
S_{\mathrm{cov}}(d, \mathcal{S}, q)
|
|
1585
|
+
=
|
|
1586
|
+
\mu \cdot S_{\mathrm{final}}(d)
|
|
1587
|
+
+ \nu \cdot A(d)
|
|
1588
|
+
+ \rho \cdot \Delta\Phi(d, \mathcal{S}, q)
|
|
1589
|
+
$$
|
|
1590
|
+
|
|
1591
|
+
where:
|
|
1592
|
+
|
|
1593
|
+
$$
|
|
1594
|
+
\mu,\nu,\rho\in[0,1],
|
|
1595
|
+
\qquad
|
|
1596
|
+
\mu+\nu+\rho=1
|
|
1597
|
+
$$
|
|
1598
|
+
|
|
1599
|
+
The default shipped weights are $\mu=0.60$, $\nu=0.20$, and $\rho=0.20$.
|
|
1600
|
+
|
|
1601
|
+
Blend this with the standard score using $\xi(q)$ as an interpolation scalar:
|
|
1602
|
+
|
|
1603
|
+
$$
|
|
1604
|
+
S_{\mathrm{proj}}(d, \mathcal{S}, q)
|
|
1605
|
+
=
|
|
1606
|
+
(1 - \xi(q)) \cdot S_{\mathrm{final}}(d)
|
|
1607
|
+
+ \xi(q) \cdot S_{\mathrm{cov}}(d, \mathcal{S}, q)
|
|
1608
|
+
$$
|
|
1609
|
+
|
|
1610
|
+
Substituting $S_{\mathrm{cov}}$ yields:
|
|
1611
|
+
|
|
1612
|
+
$$
|
|
1613
|
+
S_{\mathrm{proj}}
|
|
1614
|
+
=
|
|
1615
|
+
\bigl(1 - \xi(1-\mu)\bigr)\cdot S_{\mathrm{final}}
|
|
1616
|
+
+ \xi\nu \cdot A
|
|
1617
|
+
+ \xi\rho \cdot \Delta\Phi
|
|
1618
|
+
$$
|
|
1619
|
+
|
|
1620
|
+
All coefficients are non-negative, and they sum to one:
|
|
1621
|
+
|
|
1622
|
+
$$
|
|
1623
|
+
\bigl(1 - \xi(1-\mu)\bigr) + \xi\nu + \xi\rho
|
|
1624
|
+
=
|
|
1625
|
+
1 - \xi + \xi\mu + \xi\nu + \xi\rho
|
|
1626
|
+
=
|
|
1627
|
+
1 - \xi + \xi(\mu+\nu+\rho)
|
|
1628
|
+
=
|
|
1629
|
+
1
|
|
1630
|
+
$$
|
|
1631
|
+
|
|
1632
|
+
Because $S_{\mathrm{final}}(d)$, $A(d)$, and
|
|
1633
|
+
$\Delta\Phi(d,\mathcal{S},q)$ all lie in $[0,1]$, this is a proper convex
|
|
1634
|
+
combination, so:
|
|
1635
|
+
|
|
1636
|
+
$$
|
|
1637
|
+
S_{\mathrm{proj}}(d,\mathcal{S},q)\in[0,1]
|
|
1638
|
+
$$
|
|
1639
|
+
|
|
1640
|
+
Degeneracy cases:
|
|
1641
|
+
|
|
1642
|
+
| Condition | Behavior |
|
|
1643
|
+
| --- | --- |
|
|
1644
|
+
| $\xi(q)=0$ | $S_{\mathrm{proj}} = S_{\mathrm{final}}(d)$; standard retrieval unchanged |
|
|
1645
|
+
| $\xi(q)=1$, $\nu=\rho=0$, $\mu=1$ | Explicit no-op configuration; still $S_{\mathrm{proj}} = S_{\mathrm{final}}(d)$ |
|
|
1646
|
+
| $|E(q)|=0$ | $\Delta\Phi=0$ for all $d$; the $\rho$ term vanishes |
|
|
1647
|
+
| $\mathcal{S}=\emptyset$ | $\Delta\Phi$ equals full slot-coverage fraction |
|
|
1648
|
+
| all slots already covered by $\mathcal{S}$ | $\Delta\Phi=0$ for all remaining $d$ |
|
|
1649
|
+
|
|
1650
|
+
Note: the greedy selector below optimizes a submodular coverage term
|
|
1651
|
+
$\Delta\Phi$ augmented with fixed document priors $S_{\mathrm{final}}(d)$ and
|
|
1652
|
+
$A(d)$. The classic $(1-1/e)$ approximation guarantee applies strictly to the
|
|
1653
|
+
coverage component; in practice the blended score preserves greedy usefulness
|
|
1654
|
+
for temporal anchor selection.
|
|
1655
|
+
|
|
1656
|
+
### 9.6 Temporal Recovery Candidate Set
|
|
1657
|
+
$\mathcal{C}_{\mathrm{rec}}(q)$
|
|
1658
|
+
|
|
1659
|
+
The root cause of the observed benchmark failure is not only that documents are
|
|
1660
|
+
scored incorrectly; it is also that the necessary complementary anchor turn may
|
|
1661
|
+
never enter $\mathcal{C}_2(q)$ because its semantic similarity to the
|
|
1662
|
+
whole-query embedding is too low.
|
|
1663
|
+
|
|
1664
|
+
A bounded recovery pass admits anchor-rich documents below the normal Pass-1
|
|
1665
|
+
threshold:
|
|
1666
|
+
|
|
1667
|
+
$$
|
|
1668
|
+
\mathcal{C}_{\mathrm{rec}}(q)
|
|
1669
|
+
=
|
|
1670
|
+
\mathrm{TopK}_{d \in
|
|
1671
|
+
\left\{d' \in \mathcal{V}_{\mathrm{rest}} :
|
|
1672
|
+
\mathrm{sim}(q,d') \ge \theta_{\mathrm{rec}}\right\}}
|
|
1673
|
+
\left(k_{\mathrm{rec}},\, A(d)\right)
|
|
1674
|
+
\setminus \mathcal{C}_2(q)
|
|
1675
|
+
$$
|
|
1676
|
+
|
|
1677
|
+
where:
|
|
1678
|
+
|
|
1679
|
+
- $\theta_{\mathrm{rec}} < \theta_1$ is a looser semantic floor, default
|
|
1680
|
+
$\theta_{\mathrm{rec}} = 0.15$, preventing pure noise while still admitting
|
|
1681
|
+
anchor-heavy but semantically distant turns.
|
|
1682
|
+
- $k_{\mathrm{rec}}$ is a small cap, default $k_{\mathrm{rec}} = 10$, bounding
|
|
1683
|
+
recovery cost to $O(k_{\mathrm{rec}})$.
|
|
1684
|
+
|
|
1685
|
+
The combined candidate pool for the greedy selector is:
|
|
1686
|
+
|
|
1687
|
+
$$
|
|
1688
|
+
\mathcal{C}_{\mathrm{pool}}(q)
|
|
1689
|
+
=
|
|
1690
|
+
\mathcal{C}_2(q)\cup\mathcal{C}_{\mathrm{rec}}(q)
|
|
1691
|
+
$$
|
|
1692
|
+
|
|
1693
|
+
By construction,
|
|
1694
|
+
$\mathcal{C}_{\mathrm{pool}}(q)\subseteq\mathcal{V}_{\mathrm{rest}}$, so
|
|
1695
|
+
partition integrity is preserved.
|
|
1696
|
+
|
|
1697
|
+
### 9.7 Greedy Coverage-Aware Selector
|
|
1698
|
+
|
|
1699
|
+
Given $\mathcal{C}_{\mathrm{pool}}(q)$, the selector builds the final chosen
|
|
1700
|
+
set greedily, using the same rank-then-prefix-accept spirit as the existing
|
|
1701
|
+
token-budget packing in Section 7.8.
|
|
1702
|
+
|
|
1703
|
+
Let $k_{\mathrm{cov}}\le k_2$ be the maximum number of anchor turns to select,
|
|
1704
|
+
default $k_{\mathrm{cov}}=3$.
|
|
1705
|
+
|
|
1706
|
+
Initialize:
|
|
1707
|
+
|
|
1708
|
+
$$
|
|
1709
|
+
\mathcal{S}_0 = \emptyset
|
|
1710
|
+
$$
|
|
1711
|
+
|
|
1712
|
+
For $i = 0, 1, \dots, k_{\mathrm{cov}}-1$:
|
|
1713
|
+
|
|
1714
|
+
$$
|
|
1715
|
+
d_i^*
|
|
1716
|
+
=
|
|
1717
|
+
\underset{d \in \mathcal{C}_{\mathrm{pool}}(q)\setminus\mathcal{S}_i}{\arg\max}
|
|
1718
|
+
\;
|
|
1719
|
+
S_{\mathrm{proj}}(d, \mathcal{S}_i, q)
|
|
1720
|
+
$$
|
|
1721
|
+
|
|
1722
|
+
Early stop if:
|
|
1723
|
+
|
|
1724
|
+
$$
|
|
1725
|
+
S_{\mathrm{proj}}(d_i^*, \mathcal{S}_i, q) < \theta_{\mathrm{stop}}
|
|
1726
|
+
$$
|
|
1727
|
+
|
|
1728
|
+
with default $\theta_{\mathrm{stop}}=0.10$. Otherwise:
|
|
1729
|
+
|
|
1730
|
+
$$
|
|
1731
|
+
\mathcal{S}_{i+1} = \mathcal{S}_i \cup \{d_i^*\}
|
|
1732
|
+
$$
|
|
1733
|
+
|
|
1734
|
+
The final selected set is $\mathcal{S}^*(q)$, or the earlier set at which
|
|
1735
|
+
early stopping triggered.
|
|
1736
|
+
|
|
1737
|
+
Each greedy step scans at most
|
|
1738
|
+
$|\mathcal{C}_{\mathrm{pool}}(q)| \le k_2 + k_{\mathrm{rec}}$ candidates.
|
|
1739
|
+
Total complexity is therefore:
|
|
1740
|
+
|
|
1741
|
+
$$
|
|
1742
|
+
O\!\left(k_{\mathrm{cov}} \cdot (k_2 + k_{\mathrm{rec}})\right)
|
|
1743
|
+
$$
|
|
1744
|
+
|
|
1745
|
+
which is negligible relative to embedding and vector-search cost.
|
|
1746
|
+
|
|
1747
|
+
### 9.8 Modified Projection Operator
|
|
1748
|
+
|
|
1749
|
+
The temporal extension redefines $\mathrm{Proj}$ conditionally:
|
|
1750
|
+
|
|
1751
|
+
$$
|
|
1752
|
+
\mathrm{Proj}(\mathcal{V}_{\mathrm{rest}}, q)
|
|
1753
|
+
=
|
|
1754
|
+
\begin{cases}
|
|
1755
|
+
\mathcal{S}^*(q)\cup\mathcal{C}_{hop}^{*}(q)
|
|
1756
|
+
& \text{if } \xi(q) \ge \theta_\xi \\[4pt]
|
|
1757
|
+
\mathcal{C}_2(q)\cup\mathcal{C}_{hop}^{*}(q)
|
|
1758
|
+
& \text{otherwise}
|
|
1759
|
+
\end{cases}
|
|
1760
|
+
$$
|
|
1761
|
+
|
|
1762
|
+
The assembly law and budget equations remain unchanged:
|
|
1763
|
+
|
|
1764
|
+
$$
|
|
1765
|
+
C_{\mathrm{total}}(q)=\mathcal{I}_1\cup\mathcal{I}_2^{*}\cup T_{\mathrm{recent}}\cup \mathrm{Proj}(\mathcal{V}_{\mathrm{rest}}, q)
|
|
1766
|
+
$$
|
|
1767
|
+
|
|
1768
|
+
$$
|
|
1769
|
+
\tau_{\mathcal{V}}(q)
|
|
1770
|
+
=
|
|
1771
|
+
\tau-\tau_{\mathcal{I}_1}
|
|
1772
|
+
-\sum_{d\in\mathcal{I}_2^{*}}\mathrm{toks}(d)
|
|
1773
|
+
-\sum_{d\in T_{\mathrm{recent}}}\mathrm{toks}(d)
|
|
1774
|
+
$$
|
|
1775
|
+
|
|
1776
|
+
Documents in $\mathrm{Proj}(\mathcal{V}_{\mathrm{rest}}, q)$ are injected in
|
|
1777
|
+
descending $\sigma(d)$ order until $\tau_{\mathcal{V}}(q)$ is exhausted.
|
|
1778
|
+
|
|
1779
|
+
For documents entering through the temporal selector, the merged score sequence
|
|
1780
|
+
is extended:
|
|
1781
|
+
|
|
1782
|
+
$$
|
|
1783
|
+
\sigma(d)=
|
|
1784
|
+
\begin{cases}
|
|
1785
|
+
S_{\mathrm{proj}}(d, \mathcal{S}^*\setminus\{d\}, q)
|
|
1786
|
+
& d\in\mathcal{S}^*(q) \\
|
|
1787
|
+
S_{hop}(d)
|
|
1788
|
+
& d\in\mathcal{C}_{hop}^{*}(q)
|
|
1789
|
+
\end{cases}
|
|
1790
|
+
$$
|
|
1791
|
+
|
|
1792
|
+
For documents that were already present in $\mathcal{C}_2(q)$, the standard
|
|
1793
|
+
$S_{\mathrm{final}}(d)$ path remains authoritative and duplicates are excluded
|
|
1794
|
+
by construction.
|
|
1795
|
+
|
|
1796
|
+
### 9.9 Preservation of Section 7.10 Runtime Invariants
|
|
1797
|
+
|
|
1798
|
+
All runtime invariants from Section 7.10 remain preserved:
|
|
1799
|
+
|
|
1800
|
+
1. Invariant completeness is unaffected because $\mathcal{I}_1$ injection is
|
|
1801
|
+
independent of $\mathrm{Proj}$.
|
|
1802
|
+
2. Soft invariant order preservation is unaffected because
|
|
1803
|
+
$\mathcal{I}_2^{*}$ is unchanged.
|
|
1804
|
+
3. Partition integrity is preserved because
|
|
1805
|
+
$\mathcal{C}_{\mathrm{rec}}\subseteq\mathcal{V}_{\mathrm{rest}}$ and
|
|
1806
|
+
$\mathcal{S}^*\subseteq\mathcal{C}_{\mathrm{pool}}
|
|
1807
|
+
\subseteq\mathcal{V}_{\mathrm{rest}}$.
|
|
1808
|
+
4. Mandatory recent-tail completeness is unaffected because
|
|
1809
|
+
$T_{\mathrm{base}}\subseteq T_{\mathrm{recent}}$ remains independent of
|
|
1810
|
+
$\mathrm{Proj}$.
|
|
1811
|
+
5. Score boundedness is preserved because
|
|
1812
|
+
$S_{\mathrm{proj}}(d,\mathcal{S},q)\in[0,1]$.
|
|
1813
|
+
6. Token budget respect is preserved because the result still flows through the
|
|
1814
|
+
same residual variant budget and greedy token packing contract.
|
|
1815
|
+
7. Compaction boundary safety is preserved because
|
|
1816
|
+
$\mathcal{S}^*\subseteq\mathcal{V}_{\mathrm{rest}}$.
|
|
1817
|
+
8. Hop termination is unchanged because $\mathcal{C}_{hop}^{*}(q)$ is defined
|
|
1818
|
+
identically.
|
|
1819
|
+
9. Edge-case safety is preserved by the guards below.
|
|
1820
|
+
|
|
1821
|
+
Edge-case additions:
|
|
1822
|
+
|
|
1823
|
+
- $\mathcal{C}_{\mathrm{pool}}(q)=\emptyset$: the greedy selector returns
|
|
1824
|
+
$\mathcal{S}^*=\emptyset$ and $\mathrm{Proj}$ reduces to
|
|
1825
|
+
$\mathcal{C}_{hop}^{*}(q)$ only.
|
|
1826
|
+
- $|E(q)|=0$: the denominator in $\Delta\Phi$ uses $\max(|E(q)|,1)$, so no
|
|
1827
|
+
division by zero is possible.
|
|
1828
|
+
- $\xi(q)<\theta_\xi$: the conditional routes directly to the existing
|
|
1829
|
+
$\mathcal{C}_2(q)\cup\mathcal{C}_{hop}^{*}(q)$ behavior.
|
|
1830
|
+
- $\tau_{\mathcal{V}}(q)=0$: the selector may compute $\mathcal{S}^*$, but
|
|
1831
|
+
packing injects zero documents and the budget invariant still holds.
|
|
1832
|
+
|
|
1833
|
+
### 9.10 Symbol Table (Section 9 Additions)
|
|
1834
|
+
|
|
1835
|
+
| Symbol | Domain | Meaning |
|
|
1836
|
+
| --- | --- | --- |
|
|
1837
|
+
| $\xi(q)$ | $[0,1]$ | Temporal-compositional query indicator |
|
|
1838
|
+
| $\theta_\xi$ | $(0,1)$ | Activation threshold for temporal mode |
|
|
1839
|
+
| $\theta_{\xi}^{\mathrm{norm}}$ | $(0,\infty)$ | Saturation normalization for $\xi$ |
|
|
1840
|
+
| $A(d)$ | $[0,1]$ | Temporal anchor density of document $d$ |
|
|
1841
|
+
| $\theta_A^{\mathrm{norm}}$ | $(0,\infty)$ | Saturation normalization for $A$ |
|
|
1842
|
+
| $E(q)$ | ordered tuple set | Event-slot sequence extracted from $q$ |
|
|
1843
|
+
| $\phi_j(d)$ | $\{0,1\}$ | Binary slot-match indicator |
|
|
1844
|
+
| $\theta_e$ | $[-1,1]$ | Slot-match similarity threshold |
|
|
1845
|
+
| $\Delta\Phi(d,\mathcal{S},q)$ | $[0,1]$ | Marginal event-slot coverage |
|
|
1846
|
+
| $\mu,\nu,\rho$ | $[0,1]$, sum to 1 | Coverage score weights |
|
|
1847
|
+
| $S_{\mathrm{cov}}(d,\mathcal{S},q)$ | $[0,1]$ | Coverage-augmented score |
|
|
1848
|
+
| $S_{\mathrm{proj}}(d,\mathcal{S},q)$ | $[0,1]$ | Final blended projection score |
|
|
1849
|
+
| $\mathcal{C}_{\mathrm{rec}}(q)$ | $\subseteq\mathcal{V}_{\mathrm{rest}}$ | Recovery candidate set |
|
|
1850
|
+
| $\theta_{\mathrm{rec}}$ | $[-1,1]$ | Semantic floor for recovery pass |
|
|
1851
|
+
| $k_{\mathrm{rec}}$ | $\mathbb{Z}_{>0}$ | Recovery set size cap |
|
|
1852
|
+
| $\mathcal{C}_{\mathrm{pool}}(q)$ | $\subseteq\mathcal{V}_{\mathrm{rest}}$ | Combined greedy input pool |
|
|
1853
|
+
| $k_{\mathrm{cov}}$ | $\mathbb{Z}_{>0}, \le k_2$ | Maximum anchor turns to select |
|
|
1854
|
+
| $\theta_{\mathrm{stop}}$ | $[0,1]$ | Early-stop floor for greedy selector |
|
|
1855
|
+
| $\mathcal{S}^*(q)$ | $\subseteq\mathcal{C}_{\mathrm{pool}}$ | Greedy-selected coverage-aware anchor set |
|
|
1856
|
+
|
|
1857
|
+
### 9.11 Relationship to Existing Sections
|
|
1858
|
+
|
|
1859
|
+
This section is an extension, not a replacement:
|
|
1860
|
+
|
|
1861
|
+
- Section 1 hybrid score $\mathrm{score}(d)$ is unchanged and still feeds
|
|
1862
|
+
$S_{\mathrm{final}}(d)$ as before.
|
|
1863
|
+
- Section 7.5 $S_{\mathrm{final}}(d)$ is the first input to
|
|
1864
|
+
$S_{\mathrm{proj}}$; when $\xi(q)=0$, the two are identical.
|
|
1865
|
+
- Section 7.7 hop expansion $\mathcal{C}_{hop}^{*}$ is unchanged and is
|
|
1866
|
+
unioned with $\mathcal{S}^*$ exactly as before.
|
|
1867
|
+
- Section 7.8 budget arithmetic is unchanged; $\mathrm{Proj}$ is still bounded
|
|
1868
|
+
by $\tau_{\mathcal{V}}(q)$ and still greedy-packed.
|
|
1869
|
+
- [`gating.md`](./gating.md) inspired the saturating-sum pattern for $\xi(q)$,
|
|
1870
|
+
but the two operate on different objects and at different pipeline stages.
|
|
1871
|
+
- [`ast-v2.md`](./ast-v2.md) Section 7's document-addressed cache $\Psi$ should
|
|
1872
|
+
be extended to store the precomputed $A(d)$ value alongside existing tier and
|
|
1873
|
+
budget metadata.
|