convmemory 0.4.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- convmemory-0.4.0/LICENSE +21 -0
- convmemory-0.4.0/PKG-INFO +517 -0
- convmemory-0.4.0/README.md +489 -0
- convmemory-0.4.0/convmemory/__init__.py +35 -0
- convmemory-0.4.0/convmemory/api.py +733 -0
- convmemory-0.4.0/convmemory/ccge.py +391 -0
- convmemory-0.4.0/convmemory/encoder.py +150 -0
- convmemory-0.4.0/convmemory/hub.py +45 -0
- convmemory-0.4.0/convmemory/metrics.py +14 -0
- convmemory-0.4.0/convmemory/models.py +31 -0
- convmemory-0.4.0/convmemory/reranker.py +253 -0
- convmemory-0.4.0/convmemory/routing.py +208 -0
- convmemory-0.4.0/convmemory/scoring.py +314 -0
- convmemory-0.4.0/convmemory.egg-info/PKG-INFO +517 -0
- convmemory-0.4.0/convmemory.egg-info/SOURCES.txt +22 -0
- convmemory-0.4.0/convmemory.egg-info/dependency_links.txt +1 -0
- convmemory-0.4.0/convmemory.egg-info/requires.txt +9 -0
- convmemory-0.4.0/convmemory.egg-info/top_level.txt +1 -0
- convmemory-0.4.0/pyproject.toml +42 -0
- convmemory-0.4.0/setup.cfg +4 -0
- convmemory-0.4.0/tests/test_api_smoke.py +122 -0
- convmemory-0.4.0/tests/test_ccge.py +139 -0
- convmemory-0.4.0/tests/test_hub_loading.py +45 -0
- convmemory-0.4.0/tests/test_reranker.py +22 -0
convmemory-0.4.0/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 ConvMemory contributors
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,517 @@
|
|
|
1
|
+
Metadata-Version: 2.1
|
|
2
|
+
Name: convmemory
|
|
3
|
+
Version: 0.4.0
|
|
4
|
+
Summary: Lightweight temporal memory reranking for long-term conversational memory.
|
|
5
|
+
Author: ConvMemory contributors
|
|
6
|
+
License: MIT
|
|
7
|
+
Project-URL: Homepage, https://github.com/pth2002/ConvMemory
|
|
8
|
+
Project-URL: Issues, https://github.com/pth2002/ConvMemory/issues
|
|
9
|
+
Keywords: memory,retrieval,reranking,rag,agents
|
|
10
|
+
Classifier: Development Status :: 4 - Beta
|
|
11
|
+
Classifier: Intended Audience :: Developers
|
|
12
|
+
Classifier: Intended Audience :: Science/Research
|
|
13
|
+
Classifier: Programming Language :: Python :: 3
|
|
14
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
15
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
16
|
+
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
17
|
+
Requires-Python: >=3.10
|
|
18
|
+
Description-Content-Type: text/markdown
|
|
19
|
+
License-File: LICENSE
|
|
20
|
+
Requires-Dist: numpy>=1.24
|
|
21
|
+
Requires-Dist: torch>=2.0
|
|
22
|
+
Requires-Dist: sentence-transformers>=2.2.0
|
|
23
|
+
Requires-Dist: transformers>=4.30
|
|
24
|
+
Requires-Dist: tqdm>=4.60
|
|
25
|
+
Requires-Dist: scikit-learn>=1.2
|
|
26
|
+
Provides-Extra: hub
|
|
27
|
+
Requires-Dist: huggingface_hub>=0.20; extra == "hub"
|
|
28
|
+
|
|
29
|
+
# ConvMemory
|
|
30
|
+
|
|
31
|
+
[](https://github.com/pth2002/ConvMemory/actions/workflows/ci.yml)
|
|
32
|
+
[](LICENSE)
|
|
33
|
+
[](pyproject.toml)
|
|
34
|
+
|
|
35
|
+
ConvMemory is a lightweight learned memory reranker for long-term
|
|
36
|
+
conversational and agent memory.
|
|
37
|
+
|
|
38
|
+
It runs after vector search and before prompt construction:
|
|
39
|
+
|
|
40
|
+
```text
|
|
41
|
+
user query -> vector search top-k -> ConvMemory -> memory context
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
ConvMemory is not a vector database, a full QA system, or a general document
|
|
45
|
+
reranker. Its intended use is recall-oriented memory selection for structured
|
|
46
|
+
memory streams: conversations, user histories, agent traces, task logs, and
|
|
47
|
+
session-level notes.
|
|
48
|
+
|
|
49
|
+
Current package version: `0.4.0`
|
|
50
|
+
|
|
51
|
+
## When To Use It
|
|
52
|
+
|
|
53
|
+
Use ConvMemory when:
|
|
54
|
+
|
|
55
|
+
- your memory store has session or user-history structure;
|
|
56
|
+
- raw vector search misses important neighboring evidence;
|
|
57
|
+
- you need a cheaper high-recall stage before an optional cross-encoder pass;
|
|
58
|
+
- the downstream agent can benefit from a compact, reranked memory context.
|
|
59
|
+
|
|
60
|
+
Do not use ConvMemory as:
|
|
61
|
+
|
|
62
|
+
- a vector database;
|
|
63
|
+
- a general web/document reranker;
|
|
64
|
+
- an end-to-end answer-generation model;
|
|
65
|
+
- a universal replacement for modern cross-encoders.
|
|
66
|
+
|
|
67
|
+
The strongest current deployment pattern is a cascade:
|
|
68
|
+
|
|
69
|
+
```text
|
|
70
|
+
vector top500 -> ConvMemory candidate stage -> optional small cross-encoder -> memory context
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
Public alpha: the CCGE-LA conflict editor can be inserted after ConvMemory to
|
|
74
|
+
repair stale/current memory conflicts. See [CCGE-LA](docs/CCGE_LA.md) and the
|
|
75
|
+
broader [research trajectory](docs/RESEARCH_TRAJECTORY.md). The API is
|
|
76
|
+
available in `convmemory.ccge`; the alpha editor checkpoint is published on
|
|
77
|
+
Hugging Face Hub for opt-in use.
|
|
78
|
+
|
|
79
|
+
## Installation
|
|
80
|
+
|
|
81
|
+
```bash
|
|
82
|
+
pip install convmemory
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
For development from source:
|
|
86
|
+
|
|
87
|
+
```bash
|
|
88
|
+
git clone https://github.com/pth2002/ConvMemory.git
|
|
89
|
+
cd ConvMemory
|
|
90
|
+
pip install -e .
|
|
91
|
+
pip install -r requirements.txt
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
ConvMemory requires Python 3.10 or later.
|
|
95
|
+
|
|
96
|
+
## Checkpoints
|
|
97
|
+
|
|
98
|
+
The public LoCoMo MPNet checkpoint is available from Hugging Face Hub:
|
|
99
|
+
|
|
100
|
+
[Purdy0228/ConvMemory-LoCoMo-MPNet](https://huggingface.co/Purdy0228/ConvMemory-LoCoMo-MPNet)
|
|
101
|
+
|
|
102
|
+
```python
|
|
103
|
+
from convmemory import ConvMemory
|
|
104
|
+
|
|
105
|
+
model = ConvMemory.from_pretrained("Purdy0228/ConvMemory-LoCoMo-MPNet")
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
The same checkpoint is also distributed as a GitHub release asset:
|
|
109
|
+
|
|
110
|
+
[Download `convmemory-locomo-mpnet.zip`](https://github.com/pth2002/ConvMemory/releases/download/v0.1.0/convmemory-locomo-mpnet.zip)
|
|
111
|
+
|
|
112
|
+
Extract it from the repository root:
|
|
113
|
+
|
|
114
|
+
```bash
|
|
115
|
+
mkdir -p checkpoints
|
|
116
|
+
unzip convmemory-locomo-mpnet.zip -d checkpoints
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
On Windows PowerShell:
|
|
120
|
+
|
|
121
|
+
```powershell
|
|
122
|
+
New-Item -ItemType Directory -Force -Path checkpoints
|
|
123
|
+
Expand-Archive .\convmemory-locomo-mpnet.zip -DestinationPath .\checkpoints -Force
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
Expected layout:
|
|
127
|
+
|
|
128
|
+
```text
|
|
129
|
+
checkpoints/convmemory-locomo-mpnet/
|
|
130
|
+
config.json
|
|
131
|
+
model.pt
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
Verify the checkpoint:
|
|
135
|
+
|
|
136
|
+
```bash
|
|
137
|
+
python examples/load_pretrained.py
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
The same checkpoint is used by the current package. Newer package versions add
|
|
141
|
+
library and evaluation utilities; they do not require a new weight file.
|
|
142
|
+
|
|
143
|
+
### Optional CCGE-LA Alpha Checkpoint
|
|
144
|
+
|
|
145
|
+
The CCGE-LA conflict editor has a separate alpha checkpoint on Hugging Face Hub:
|
|
146
|
+
|
|
147
|
+
[Purdy0228/ConvMemory-CCGE-LA](https://huggingface.co/Purdy0228/ConvMemory-CCGE-LA)
|
|
148
|
+
|
|
149
|
+
Attach it after loading the base ConvMemory checkpoint:
|
|
150
|
+
|
|
151
|
+
```python
|
|
152
|
+
model = ConvMemory.from_pretrained("Purdy0228/ConvMemory-LoCoMo-MPNet")
|
|
153
|
+
model.load_ccge_editor("Purdy0228/ConvMemory-CCGE-LA")
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
The same editor is also distributed as a GitHub release asset:
|
|
157
|
+
|
|
158
|
+
[Download `convmemory-ccge-la-locomo-mpnet-seed23-alpha.zip`](https://github.com/pth2002/ConvMemory/releases/download/ccge-la-alpha-v0.1/convmemory-ccge-la-locomo-mpnet-seed23-alpha.zip)
|
|
159
|
+
|
|
160
|
+
Extract it from the repository root:
|
|
161
|
+
|
|
162
|
+
```bash
|
|
163
|
+
unzip convmemory-ccge-la-locomo-mpnet-seed23-alpha.zip -d checkpoints
|
|
164
|
+
```
|
|
165
|
+
|
|
166
|
+
If using local files, attach it after loading the base ConvMemory checkpoint:
|
|
167
|
+
|
|
168
|
+
```python
|
|
169
|
+
model = ConvMemory.from_pretrained("checkpoints/convmemory-locomo-mpnet")
|
|
170
|
+
model.load_ccge_editor("checkpoints/convmemory-ccge-la-locomo-mpnet-seed23-alpha")
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
This is an alpha LoCoMo/MPNet editor trained on the seed-23 split. It is useful
|
|
174
|
+
for trying the public CCGE-LA API, but it is not yet the default checkpoint.
|
|
175
|
+
SHA256:
|
|
176
|
+
`459ecfb2b4c35887f1d8f2cdd87dab402c37bd8dee86628655eff08f314b2e7c`.
|
|
177
|
+
|
|
178
|
+
## Quick Start
|
|
179
|
+
|
|
180
|
+
```python
|
|
181
|
+
from convmemory import ConvMemory
|
|
182
|
+
|
|
183
|
+
model = ConvMemory.from_pretrained(
|
|
184
|
+
"Purdy0228/ConvMemory-LoCoMo-MPNet",
|
|
185
|
+
device="cuda",
|
|
186
|
+
)
|
|
187
|
+
|
|
188
|
+
memories = [
|
|
189
|
+
{"id": "m1", "text": "The user said their hiking trip moved to Sunday."},
|
|
190
|
+
{"id": "m2", "text": "The assistant recommended bringing extra water."},
|
|
191
|
+
{"id": "m3", "text": "The user has an exam next Friday."},
|
|
192
|
+
]
|
|
193
|
+
|
|
194
|
+
results = model.rerank(
|
|
195
|
+
query="When is the hiking trip?",
|
|
196
|
+
memories=memories,
|
|
197
|
+
top_k=2,
|
|
198
|
+
)
|
|
199
|
+
|
|
200
|
+
for item in results:
|
|
201
|
+
print(item.rank, item.memory_id, item.score, item.text)
|
|
202
|
+
```
|
|
203
|
+
|
|
204
|
+
Pass memories in a stable application order when that order is available.
|
|
205
|
+
|
|
206
|
+
## Agent Memory Integration
|
|
207
|
+
|
|
208
|
+
Most applications call ConvMemory after vector search:
|
|
209
|
+
|
|
210
|
+
```python
|
|
211
|
+
from convmemory import ConvMemory
|
|
212
|
+
|
|
213
|
+
memory_reranker = ConvMemory.from_pretrained(
|
|
214
|
+
"Purdy0228/ConvMemory-LoCoMo-MPNet",
|
|
215
|
+
device="cuda",
|
|
216
|
+
)
|
|
217
|
+
|
|
218
|
+
def retrieve_agent_memory(query, memory_store, top_k=15):
|
|
219
|
+
candidates = memory_store.vector_search(query, top_k=500)
|
|
220
|
+
|
|
221
|
+
ranked = memory_reranker.retrieve(
|
|
222
|
+
query=query,
|
|
223
|
+
memories=candidates,
|
|
224
|
+
mode="rerank",
|
|
225
|
+
top_k=top_k,
|
|
226
|
+
)
|
|
227
|
+
|
|
228
|
+
return [
|
|
229
|
+
{"id": item.memory_id, "text": item.text, "score": item.score}
|
|
230
|
+
for item in ranked
|
|
231
|
+
]
|
|
232
|
+
```
|
|
233
|
+
|
|
234
|
+
If the downstream agent can read a slightly wider context, use `expand` mode.
|
|
235
|
+
It preserves the strongest ConvMemory prefix and fills the remaining budget
|
|
236
|
+
with complementary candidates:
|
|
237
|
+
|
|
238
|
+
```python
|
|
239
|
+
context = memory_reranker.retrieve(
|
|
240
|
+
query=query,
|
|
241
|
+
memories=candidates,
|
|
242
|
+
mode="expand",
|
|
243
|
+
protected_k=10,
|
|
244
|
+
top_k=15,
|
|
245
|
+
)
|
|
246
|
+
```
|
|
247
|
+
|
|
248
|
+
For systems that already store embeddings, use `rerank_embeddings` to avoid
|
|
249
|
+
re-encoding the memory store:
|
|
250
|
+
|
|
251
|
+
```python
|
|
252
|
+
ranked = memory_reranker.rerank_embeddings(
|
|
253
|
+
query_embedding=query_embedding,
|
|
254
|
+
memory_embeddings=memory_embeddings,
|
|
255
|
+
memory_ids=memory_ids,
|
|
256
|
+
memory_texts=memory_texts,
|
|
257
|
+
candidate_indices=candidate_indices,
|
|
258
|
+
query=query,
|
|
259
|
+
top_k=20,
|
|
260
|
+
)
|
|
261
|
+
```
|
|
262
|
+
|
|
263
|
+
To enable the CCGE-LA conflict editor after ConvMemory, attach a trained editor
|
|
264
|
+
checkpoint and pass `editor="ccge_la"`:
|
|
265
|
+
|
|
266
|
+
```python
|
|
267
|
+
memory_reranker.load_ccge_editor("Purdy0228/ConvMemory-CCGE-LA")
|
|
268
|
+
|
|
269
|
+
ranked = memory_reranker.retrieve(
|
|
270
|
+
query=query,
|
|
271
|
+
memories=candidates,
|
|
272
|
+
mode="rerank",
|
|
273
|
+
editor="ccge_la",
|
|
274
|
+
top_k=15,
|
|
275
|
+
)
|
|
276
|
+
```
|
|
277
|
+
|
|
278
|
+
The checkpoint and embeddings must use the same embedding model family and
|
|
279
|
+
embedding dimension.
|
|
280
|
+
|
|
281
|
+
## Results
|
|
282
|
+
|
|
283
|
+
These are retrieval-stage evaluations. They measure whether annotated evidence
|
|
284
|
+
memories are retrieved into the top-k list; they do not measure final answer
|
|
285
|
+
generation.
|
|
286
|
+
|
|
287
|
+
The tables below are summarized from internal evaluation artifacts. Large
|
|
288
|
+
per-question CSV files, embedding caches, teacher caches, and checkpoints are
|
|
289
|
+
intentionally excluded from the public Git history. See the `docs/` directory
|
|
290
|
+
for the public evaluation protocol, training notes, model card, and negative
|
|
291
|
+
results write-up.
|
|
292
|
+
|
|
293
|
+
Note: v0.40-v0.51 are internal evaluation-iteration identifiers for hardening
|
|
294
|
+
experiments, not packaged PyPI releases. The installable package version is
|
|
295
|
+
0.4.0 and the public base checkpoint is unchanged.
|
|
296
|
+
|
|
297
|
+
Important scope notes:
|
|
298
|
+
|
|
299
|
+
- The public checkpoint is trained on LoCoMo-style data; LoCoMo is in-domain
|
|
300
|
+
for this checkpoint.
|
|
301
|
+
- The headline value proposition is cost-effective learned reranking for memory
|
|
302
|
+
tasks, plus a rigorous negative result about mechanism attribution.
|
|
303
|
+
- Stronger rerankers matter. ConvMemory beats BGE-reranker-base/large on
|
|
304
|
+
LoCoMo Recall@10, but it loses to `mxbai-rerank-large-v1` on both Recall@10
|
|
305
|
+
and MRR.
|
|
306
|
+
- v0.50/v0.51 refute the stronger claim that temporal structure is the reason
|
|
307
|
+
ConvMemory works. The learned temporal window contributes statistically, but
|
|
308
|
+
its benefit is not temporally specific.
|
|
309
|
+
- External OOD results are mixed. The MuSiQue negative result is reported below
|
|
310
|
+
because ConvMemory is not intended as a broad multi-hop document reranker.
|
|
311
|
+
- Latency numbers assume memory embeddings and memory-side indexes are already
|
|
312
|
+
available. Cross-encoder timing includes pairwise scoring through the tested
|
|
313
|
+
`CrossEncoder` path.
|
|
314
|
+
|
|
315
|
+
### LongMemEval Cost Advantage
|
|
316
|
+
|
|
317
|
+
This is the strongest practical story for ConvMemory: on memory-family tasks it
|
|
318
|
+
offers a much cheaper reranking stage while remaining recall-competitive.
|
|
319
|
+
|
|
320
|
+
| Setting | Method | Recall@10 | MRR | ms/query |
|
|
321
|
+
|---|---|---:|---:|---:|
|
|
322
|
+
| Clean500, BGE-large CE | Raw MPNet | 0.9049 | 0.7829 | 0.01 |
|
|
323
|
+
| Clean500, BGE-large CE | BGE-large CE top500 | 0.8807 | 0.8574 | 555.69 |
|
|
324
|
+
| Clean500, BGE-large CE | ConvMemory top500 | 0.9593 | 0.8973 | 44.00 |
|
|
325
|
+
| Clean500, mxbai CE | Raw MPNet | 0.9049 | 0.7829 | 0.01 |
|
|
326
|
+
| Clean500, mxbai CE | mxbai CE top500 | 0.9835 | 0.9317 | 1129.14 |
|
|
327
|
+
| Clean500, mxbai CE | ConvMemory top500 | 0.9593 | 0.8973 | 40.80 |
|
|
328
|
+
| Stress1000 seed23, BGE-large CE | Raw MPNet | 0.5452 | 0.4561 | 0.13 |
|
|
329
|
+
| Stress1000 seed23, BGE-large CE | BGE-large CE top500 | 0.6913 | 0.6651 | 5231.77 |
|
|
330
|
+
| Stress1000 seed23, BGE-large CE | ConvMemory candidate-local | 0.7386 | 0.6125 | 110.71 |
|
|
331
|
+
| Stress1000 seed23, mxbai CE | Raw MPNet | 0.5452 | 0.4561 | 0.12 |
|
|
332
|
+
| Stress1000 seed23, mxbai CE | mxbai CE top500 | 0.8195 | 0.7044 | 11211.63 |
|
|
333
|
+
| Stress1000 seed23, mxbai CE | ConvMemory candidate-local | 0.7386 | 0.6125 | 95.57 |
|
|
334
|
+
|
|
335
|
+
LongMemEval numbers are not seed-averaged: Clean500 is a single run and Stress1000 is reported for a single seed (23). Read these as indicative single-run retrieval-stage checks, not benchmark-grade comparisons.
|
|
336
|
+
|
|
337
|
+
Reading: ConvMemory reranks above BGE-large CE on these memory-family Recall@10
|
|
338
|
+
checks while being about 12-47x faster. It remains below mxbai accuracy, but is
|
|
339
|
+
about 28-117x lower latency in the tested settings.
|
|
340
|
+
|
|
341
|
+
### LoCoMo Strong Cross-Encoder Baselines
|
|
342
|
+
|
|
343
|
+
Five split seeds: 7, 11, 23, 31, 47. Candidate pool: raw dense top500.
|
|
344
|
+
|
|
345
|
+
| Reranker | Recall@10 | Hit@10 | MRR |
|
|
346
|
+
|---|---:|---:|---:|
|
|
347
|
+
| ConvMemory (v0.40 5-seed) | 0.7798 +/- 0.0074 | not reported | 0.5824 |
|
|
348
|
+
| BGE-reranker-base | 0.6967 +/- 0.0126 | 0.7469 +/- 0.0144 | 0.5469 +/- 0.0140 |
|
|
349
|
+
| Jina-reranker-v2-base-multilingual | 0.7411 +/- 0.0103 | 0.7924 +/- 0.0083 | 0.5754 +/- 0.0074 |
|
|
350
|
+
| BGE-reranker-large | 0.7621 +/- 0.0155 | 0.8124 +/- 0.0135 | 0.6120 +/- 0.0144 |
|
|
351
|
+
| mxbai-rerank-large-v1 | 0.8080 +/- 0.0153 | 0.8486 +/- 0.0108 | 0.6687 +/- 0.0093 |
|
|
352
|
+
|
|
353
|
+
Reading: ConvMemory is competitive on recall, but it should not be given an
|
|
354
|
+
overall cross-encoder superiority claim. `mxbai-rerank-large-v1` is stronger on
|
|
355
|
+
LoCoMo Recall@10 and MRR.
|
|
356
|
+
|
|
357
|
+
### Retrained Ablation
|
|
358
|
+
|
|
359
|
+
Three split seeds, MPNet. These are retrained ablations, not inference-time
|
|
360
|
+
feature masks.
|
|
361
|
+
|
|
362
|
+
| Variant | Recall@10 | MRR | Delta R@10 vs full |
|
|
363
|
+
|---|---:|---:|---:|
|
|
364
|
+
| full_control | 0.7474 +/- 0.0229 | 0.5343 +/- 0.0160 | 0.0000 |
|
|
365
|
+
| no_router | 0.7491 +/- 0.0213 | 0.5391 +/- 0.0137 | +0.0017 +/- 0.0020 |
|
|
366
|
+
| no_temporal_w1 | 0.7121 +/- 0.0232 | 0.5305 +/- 0.0148 | -0.0353 +/- 0.0052 |
|
|
367
|
+
| no_lexical | 0.6584 +/- 0.0185 | 0.4367 +/- 0.0129 | -0.0890 +/- 0.0061 |
|
|
368
|
+
| no_lexical_no_router | 0.6574 +/- 0.0163 | 0.4342 +/- 0.0127 | -0.0899 +/- 0.0087 |
|
|
369
|
+
|
|
370
|
+
Reading: lexical interaction features are the largest contributor. The
|
|
371
|
+
no-temporal variant is weaker than the full model in this three-seed ablation,
|
|
372
|
+
but this table alone does not prove that the gain is temporally specific. The
|
|
373
|
+
router/DCA scalar contributes approximately zero; removing it is neutral to
|
|
374
|
+
slightly positive, so it is treated as an experimental negative result rather
|
|
375
|
+
than a model feature.
|
|
376
|
+
|
|
377
|
+
### Attribution / Negative Result
|
|
378
|
+
|
|
379
|
+
The v0.50/v0.51 follow-up was designed to test whether the temporal window is
|
|
380
|
+
the load-bearing reason ConvMemory works. This section uses the retrained
|
|
381
|
+
attribution pipeline, not the v0.40 headline pipeline above.
|
|
382
|
+
|
|
383
|
+
Five split seeds: 7, 11, 23, 31, 47.
|
|
384
|
+
|
|
385
|
+
| Method | Recall@10 |
|
|
386
|
+
|---|---:|
|
|
387
|
+
| full_control_retrained | 0.7432 +/- 0.0207 |
|
|
388
|
+
| no_temporal_w1_retrained | 0.7054 +/- 0.0221 |
|
|
389
|
+
| tuned_heuristic | 0.7234 +/- 0.0227 |
|
|
390
|
+
| raw_dense | 0.5345 +/- 0.0210 |
|
|
391
|
+
|
|
392
|
+
Paired bootstrap, `full_control_retrained - no_temporal_w1_retrained`,
|
|
393
|
+
Recall@10:
|
|
394
|
+
|
|
395
|
+
| Slice | Delta | 95% CI | Reading |
|
|
396
|
+
|---|---:|---:|---|
|
|
397
|
+
| ALL | +0.0376 | [+0.0306, +0.0451] | significant |
|
|
398
|
+
| T_SUP_auto | +0.0407 | [+0.0219, +0.0603] | significant, open question |
|
|
399
|
+
| T_REQUIRED_auto | +0.0252 | [+0.0139, +0.0363] | significant |
|
|
400
|
+
| T_HOP_auto | +0.0096 | [-0.0037, +0.0230] | not significant |
|
|
401
|
+
| OTHER | +0.0868 | [+0.0672, +0.1045] | significant |
|
|
402
|
+
| HARD_NON_TEMPORAL_auto | +0.0838 | [+0.0650, +0.1040] | significant |
|
|
403
|
+
|
|
404
|
+
The honest reading is negative for the original temporal-mechanism thesis: the
|
|
405
|
+
learned temporal window contributes on aggregate, but its benefit is largest on
|
|
406
|
+
hard non-temporal controls (`OTHER` and `HARD_NON_TEMPORAL_auto`) and is not
|
|
407
|
+
statistically significant on the most temporal multi-hop proxy (`T_HOP_auto`).
|
|
408
|
+
This looks more like generic neighborhood/capacity smoothing than proven
|
|
409
|
+
temporal-structure exploitation. `T_SUP_auto` remains the only notable open
|
|
410
|
+
question, but it is still smaller than the hard non-temporal control effect and
|
|
411
|
+
should not be used as a load-bearing temporal claim.
|
|
412
|
+
|
|
413
|
+
Against the tuned heuristic, the same retrained attribution pipeline gives
|
|
414
|
+
`full_control_retrained` a Recall@10 delta of +0.0199 with 95% CI
|
|
415
|
+
[+0.0105, +0.0283], and an MRR delta of +0.0566. So the learned reranker still
|
|
416
|
+
adds value; the negative result is about why it works.
|
|
417
|
+
|
|
418
|
+
See [docs/NEGATIVE_RESULTS.md](docs/NEGATIVE_RESULTS.md) for the full
|
|
419
|
+
v0.50/v0.51 interpretation.
|
|
420
|
+
|
|
421
|
+
For the later current/stale and conflict-editor research trajectory, see
|
|
422
|
+
[docs/RESEARCH_TRAJECTORY.md](docs/RESEARCH_TRAJECTORY.md). That document
|
|
423
|
+
summarizes the internal v0.60+ research line in a public-safe form without raw
|
|
424
|
+
logs, private paths, caches, or exploratory scripts.
|
|
425
|
+
|
|
426
|
+
### Strong-Backbone Retraining
|
|
427
|
+
|
|
428
|
+
Three split seeds. ConvMemory is retrained in each embedding space.
|
|
429
|
+
|
|
430
|
+
| Backbone | Raw Recall@10 | ConvMemory Recall@10 | Gain | ConvMemory MRR |
|
|
431
|
+
|---|---:|---:|---:|---:|
|
|
432
|
+
| BGE-large | 0.6680 +/- 0.0237 | 0.7726 +/- 0.0100 | +0.1046 +/- 0.0137 | 0.5639 +/- 0.0066 |
|
|
433
|
+
| E5-large | 0.7010 +/- 0.0216 | 0.7902 +/- 0.0171 | +0.0892 +/- 0.0052 | 0.5941 +/- 0.0103 |
|
|
434
|
+
|
|
435
|
+
Reading: ConvMemory gains are not just an artifact of a weak MPNet retriever.
|
|
436
|
+
Retraining on stronger embeddings still gives about +9 to +10 Recall@10 points.
|
|
437
|
+
|
|
438
|
+
### External OOD Results
|
|
439
|
+
|
|
440
|
+
Single run per dataset. These are intentionally reported as mixed evidence.
|
|
441
|
+
|
|
442
|
+
| Dataset | Questions | ConvMemory R@10 | Raw dense | Dense + lexical | BM25 |
|
|
443
|
+
|---|---:|---:|---:|---:|---:|
|
|
444
|
+
| QMSum | 272 | 0.5882 | 0.4724 | 0.5423 | 0.5294 |
|
|
445
|
+
| MSC persona | 6155 | 0.9632 | 0.8375 | 0.9765 | 0.9920 |
|
|
446
|
+
| HotpotQA | 1000 | 0.7983 | 0.7682 | 0.8621 | 0.8280 |
|
|
447
|
+
| MuSiQue | 1000 | 0.7635 | 0.8640 | 0.8175 | 0.7245 |
|
|
448
|
+
|
|
449
|
+
These external OOD results are single runs without seed averaging or confidence intervals; treat them as indicative scope checks, not benchmark-grade comparisons.
|
|
450
|
+
|
|
451
|
+
Reading: ConvMemory wins on QMSum and improves strongly over raw dense on MSC,
|
|
452
|
+
but lexical/BM25 baselines dominate MSC's weak persona-overlap labels. On
|
|
453
|
+
HotpotQA, a trivial dense+lexical baseline is stronger. On MuSiQue, ConvMemory
|
|
454
|
+
regresses below raw dense. This is a scope boundary: ConvMemory is a memory
|
|
455
|
+
reranker, not a general multi-hop document reranker.
|
|
456
|
+
|
|
457
|
+
### Where ConvMemory Fails
|
|
458
|
+
|
|
459
|
+
- Non-temporal multi-hop retrieval: MuSiQue is negative against raw dense.
|
|
460
|
+
- Lexically anchored document retrieval: HotpotQA favors dense+lexical scoring.
|
|
461
|
+
- Maximum top-rank precision: mxbai-rerank-large remains stronger on LoCoMo MRR.
|
|
462
|
+
- Cross-query score calibration: scores should not be treated as calibrated
|
|
463
|
+
confidence without application-specific validation.
|
|
464
|
+
- Mechanism attribution: v0.51 does not support temporal structure as the
|
|
465
|
+
load-bearing explanation for ConvMemory's gain.
|
|
466
|
+
|
|
467
|
+
## Reproducibility
|
|
468
|
+
|
|
469
|
+
The current documentation reports the hardened v0.47/v0.51 audit. See:
|
|
470
|
+
|
|
471
|
+
- [docs/EVALUATION_PROTOCOL.md](docs/EVALUATION_PROTOCOL.md)
|
|
472
|
+
- [docs/MODEL_CARD.md](docs/MODEL_CARD.md)
|
|
473
|
+
- [docs/TRAINING.md](docs/TRAINING.md)
|
|
474
|
+
|
|
475
|
+
Main evaluation artifacts are kept outside the repository history. Large
|
|
476
|
+
per-question CSV files, teacher caches, and checkpoints are intentionally not
|
|
477
|
+
committed.
|
|
478
|
+
|
|
479
|
+
## Project Status
|
|
480
|
+
|
|
481
|
+
Stable public API:
|
|
482
|
+
|
|
483
|
+
- `ConvMemory.from_pretrained`
|
|
484
|
+
- `ConvMemory.rerank`
|
|
485
|
+
- `ConvMemory.retrieve`
|
|
486
|
+
- `ConvMemory.expand_context`
|
|
487
|
+
- `ConvMemory.rerank_embeddings`
|
|
488
|
+
|
|
489
|
+
Public alpha API:
|
|
490
|
+
|
|
491
|
+
- `ConvMemory.attach_ccge_editor`
|
|
492
|
+
- `ConvMemory.load_ccge_editor`
|
|
493
|
+
- `CCGELowAmplitudeEditor`
|
|
494
|
+
- `build_ccge_features`
|
|
495
|
+
|
|
496
|
+
Research-preview code:
|
|
497
|
+
|
|
498
|
+
- context expansion policies for wider agent memory budgets;
|
|
499
|
+
- cascade fusion with cross-encoder scoring;
|
|
500
|
+
- stronger cross-encoder comparison scripts;
|
|
501
|
+
- generic JSONL adapters for external memory-retrieval datasets.
|
|
502
|
+
|
|
503
|
+
Not included in the public package:
|
|
504
|
+
|
|
505
|
+
- raw datasets, checkpoints, embedding caches, teacher caches, and full
|
|
506
|
+
per-question result CSVs;
|
|
507
|
+
- local experiment logs and remote execution archives;
|
|
508
|
+
- exploratory numbered experiment prototypes unless they are explicitly promoted
|
|
509
|
+
into the documented public API.
|
|
510
|
+
|
|
511
|
+
CCGE-LA is packaged as a public alpha API for conflict-aware ConvMemory editing.
|
|
512
|
+
It should still be treated as experimental until a public training recipe and
|
|
513
|
+
same-split public evaluation command are released.
|
|
514
|
+
|
|
515
|
+
## License
|
|
516
|
+
|
|
517
|
+
MIT
|