convmemory 0.4.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 ConvMemory contributors
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,517 @@
1
+ Metadata-Version: 2.1
2
+ Name: convmemory
3
+ Version: 0.4.0
4
+ Summary: Lightweight temporal memory reranking for long-term conversational memory.
5
+ Author: ConvMemory contributors
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/pth2002/ConvMemory
8
+ Project-URL: Issues, https://github.com/pth2002/ConvMemory/issues
9
+ Keywords: memory,retrieval,reranking,rag,agents
10
+ Classifier: Development Status :: 4 - Beta
11
+ Classifier: Intended Audience :: Developers
12
+ Classifier: Intended Audience :: Science/Research
13
+ Classifier: Programming Language :: Python :: 3
14
+ Classifier: Programming Language :: Python :: 3.10
15
+ Classifier: Programming Language :: Python :: 3.11
16
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
17
+ Requires-Python: >=3.10
18
+ Description-Content-Type: text/markdown
19
+ License-File: LICENSE
20
+ Requires-Dist: numpy>=1.24
21
+ Requires-Dist: torch>=2.0
22
+ Requires-Dist: sentence-transformers>=2.2.0
23
+ Requires-Dist: transformers>=4.30
24
+ Requires-Dist: tqdm>=4.60
25
+ Requires-Dist: scikit-learn>=1.2
26
+ Provides-Extra: hub
27
+ Requires-Dist: huggingface_hub>=0.20; extra == "hub"
28
+
29
+ # ConvMemory
30
+
31
+ [![CI](https://github.com/pth2002/ConvMemory/actions/workflows/ci.yml/badge.svg)](https://github.com/pth2002/ConvMemory/actions/workflows/ci.yml)
32
+ [![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
33
+ [![Python](https://img.shields.io/badge/python-3.10%2B-blue.svg)](pyproject.toml)
34
+
35
+ ConvMemory is a lightweight learned memory reranker for long-term
36
+ conversational and agent memory.
37
+
38
+ It runs after vector search and before prompt construction:
39
+
40
+ ```text
41
+ user query -> vector search top-k -> ConvMemory -> memory context
42
+ ```
43
+
44
+ ConvMemory is not a vector database, a full QA system, or a general document
45
+ reranker. Its intended use is recall-oriented memory selection for structured
46
+ memory streams: conversations, user histories, agent traces, task logs, and
47
+ session-level notes.
48
+
49
+ Current package version: `0.4.0`
50
+
51
+ ## When To Use It
52
+
53
+ Use ConvMemory when:
54
+
55
+ - your memory store has session or user-history structure;
56
+ - raw vector search misses important neighboring evidence;
57
+ - you need a cheaper high-recall stage before an optional cross-encoder pass;
58
+ - the downstream agent can benefit from a compact, reranked memory context.
59
+
60
+ Do not use ConvMemory as:
61
+
62
+ - a vector database;
63
+ - a general web/document reranker;
64
+ - an end-to-end answer-generation model;
65
+ - a universal replacement for modern cross-encoders.
66
+
67
+ The strongest current deployment pattern is a cascade:
68
+
69
+ ```text
70
+ vector top500 -> ConvMemory candidate stage -> optional small cross-encoder -> memory context
71
+ ```
72
+
73
+ Public alpha: the CCGE-LA conflict editor can be inserted after ConvMemory to
74
+ repair stale/current memory conflicts. See [CCGE-LA](docs/CCGE_LA.md) and the
75
+ broader [research trajectory](docs/RESEARCH_TRAJECTORY.md). The API is
76
+ available in `convmemory.ccge`; the alpha editor checkpoint is published on
77
+ Hugging Face Hub for opt-in use.
78
+
79
+ ## Installation
80
+
81
+ ```bash
82
+ pip install convmemory
83
+ ```
84
+
85
+ For development from source:
86
+
87
+ ```bash
88
+ git clone https://github.com/pth2002/ConvMemory.git
89
+ cd ConvMemory
90
+ pip install -e .
91
+ pip install -r requirements.txt
92
+ ```
93
+
94
+ ConvMemory requires Python 3.10 or later.
95
+
96
+ ## Checkpoints
97
+
98
+ The public LoCoMo MPNet checkpoint is available from Hugging Face Hub:
99
+
100
+ [Purdy0228/ConvMemory-LoCoMo-MPNet](https://huggingface.co/Purdy0228/ConvMemory-LoCoMo-MPNet)
101
+
102
+ ```python
103
+ from convmemory import ConvMemory
104
+
105
+ model = ConvMemory.from_pretrained("Purdy0228/ConvMemory-LoCoMo-MPNet")
106
+ ```
107
+
108
+ The same checkpoint is also distributed as a GitHub release asset:
109
+
110
+ [Download `convmemory-locomo-mpnet.zip`](https://github.com/pth2002/ConvMemory/releases/download/v0.1.0/convmemory-locomo-mpnet.zip)
111
+
112
+ Extract it from the repository root:
113
+
114
+ ```bash
115
+ mkdir -p checkpoints
116
+ unzip convmemory-locomo-mpnet.zip -d checkpoints
117
+ ```
118
+
119
+ On Windows PowerShell:
120
+
121
+ ```powershell
122
+ New-Item -ItemType Directory -Force -Path checkpoints
123
+ Expand-Archive .\convmemory-locomo-mpnet.zip -DestinationPath .\checkpoints -Force
124
+ ```
125
+
126
+ Expected layout:
127
+
128
+ ```text
129
+ checkpoints/convmemory-locomo-mpnet/
130
+ config.json
131
+ model.pt
132
+ ```
133
+
134
+ Verify the checkpoint:
135
+
136
+ ```bash
137
+ python examples/load_pretrained.py
138
+ ```
139
+
140
+ The same checkpoint is used by the current package. Newer package versions add
141
+ library and evaluation utilities; they do not require a new weight file.
142
+
143
+ ### Optional CCGE-LA Alpha Checkpoint
144
+
145
+ The CCGE-LA conflict editor has a separate alpha checkpoint on Hugging Face Hub:
146
+
147
+ [Purdy0228/ConvMemory-CCGE-LA](https://huggingface.co/Purdy0228/ConvMemory-CCGE-LA)
148
+
149
+ Attach it after loading the base ConvMemory checkpoint:
150
+
151
+ ```python
152
+ model = ConvMemory.from_pretrained("Purdy0228/ConvMemory-LoCoMo-MPNet")
153
+ model.load_ccge_editor("Purdy0228/ConvMemory-CCGE-LA")
154
+ ```
155
+
156
+ The same editor is also distributed as a GitHub release asset:
157
+
158
+ [Download `convmemory-ccge-la-locomo-mpnet-seed23-alpha.zip`](https://github.com/pth2002/ConvMemory/releases/download/ccge-la-alpha-v0.1/convmemory-ccge-la-locomo-mpnet-seed23-alpha.zip)
159
+
160
+ Extract it from the repository root:
161
+
162
+ ```bash
163
+ unzip convmemory-ccge-la-locomo-mpnet-seed23-alpha.zip -d checkpoints
164
+ ```
165
+
166
+ If using local files, attach it after loading the base ConvMemory checkpoint:
167
+
168
+ ```python
169
+ model = ConvMemory.from_pretrained("checkpoints/convmemory-locomo-mpnet")
170
+ model.load_ccge_editor("checkpoints/convmemory-ccge-la-locomo-mpnet-seed23-alpha")
171
+ ```
172
+
173
+ This is an alpha LoCoMo/MPNet editor trained on the seed-23 split. It is useful
174
+ for trying the public CCGE-LA API, but it is not yet the default checkpoint.
175
+ SHA256:
176
+ `459ecfb2b4c35887f1d8f2cdd87dab402c37bd8dee86628655eff08f314b2e7c`.
177
+
178
+ ## Quick Start
179
+
180
+ ```python
181
+ from convmemory import ConvMemory
182
+
183
+ model = ConvMemory.from_pretrained(
184
+ "Purdy0228/ConvMemory-LoCoMo-MPNet",
185
+ device="cuda",
186
+ )
187
+
188
+ memories = [
189
+ {"id": "m1", "text": "The user said their hiking trip moved to Sunday."},
190
+ {"id": "m2", "text": "The assistant recommended bringing extra water."},
191
+ {"id": "m3", "text": "The user has an exam next Friday."},
192
+ ]
193
+
194
+ results = model.rerank(
195
+ query="When is the hiking trip?",
196
+ memories=memories,
197
+ top_k=2,
198
+ )
199
+
200
+ for item in results:
201
+ print(item.rank, item.memory_id, item.score, item.text)
202
+ ```
203
+
204
+ Pass memories in a stable application order when that order is available.
205
+
206
+ ## Agent Memory Integration
207
+
208
+ Most applications call ConvMemory after vector search:
209
+
210
+ ```python
211
+ from convmemory import ConvMemory
212
+
213
+ memory_reranker = ConvMemory.from_pretrained(
214
+ "Purdy0228/ConvMemory-LoCoMo-MPNet",
215
+ device="cuda",
216
+ )
217
+
218
+ def retrieve_agent_memory(query, memory_store, top_k=15):
219
+ candidates = memory_store.vector_search(query, top_k=500)
220
+
221
+ ranked = memory_reranker.retrieve(
222
+ query=query,
223
+ memories=candidates,
224
+ mode="rerank",
225
+ top_k=top_k,
226
+ )
227
+
228
+ return [
229
+ {"id": item.memory_id, "text": item.text, "score": item.score}
230
+ for item in ranked
231
+ ]
232
+ ```
233
+
234
+ If the downstream agent can read a slightly wider context, use `expand` mode.
235
+ It preserves the strongest ConvMemory prefix and fills the remaining budget
236
+ with complementary candidates:
237
+
238
+ ```python
239
+ context = memory_reranker.retrieve(
240
+ query=query,
241
+ memories=candidates,
242
+ mode="expand",
243
+ protected_k=10,
244
+ top_k=15,
245
+ )
246
+ ```
247
+
248
+ For systems that already store embeddings, use `rerank_embeddings` to avoid
249
+ re-encoding the memory store:
250
+
251
+ ```python
252
+ ranked = memory_reranker.rerank_embeddings(
253
+ query_embedding=query_embedding,
254
+ memory_embeddings=memory_embeddings,
255
+ memory_ids=memory_ids,
256
+ memory_texts=memory_texts,
257
+ candidate_indices=candidate_indices,
258
+ query=query,
259
+ top_k=20,
260
+ )
261
+ ```
262
+
263
+ To enable the CCGE-LA conflict editor after ConvMemory, attach a trained editor
264
+ checkpoint and pass `editor="ccge_la"`:
265
+
266
+ ```python
267
+ memory_reranker.load_ccge_editor("Purdy0228/ConvMemory-CCGE-LA")
268
+
269
+ ranked = memory_reranker.retrieve(
270
+ query=query,
271
+ memories=candidates,
272
+ mode="rerank",
273
+ editor="ccge_la",
274
+ top_k=15,
275
+ )
276
+ ```
277
+
278
+ The checkpoint and embeddings must use the same embedding model family and
279
+ embedding dimension.
280
+
281
+ ## Results
282
+
283
+ These are retrieval-stage evaluations. They measure whether annotated evidence
284
+ memories are retrieved into the top-k list; they do not measure final answer
285
+ generation.
286
+
287
+ The tables below are summarized from internal evaluation artifacts. Large
288
+ per-question CSV files, embedding caches, teacher caches, and checkpoints are
289
+ intentionally excluded from the public Git history. See the `docs/` directory
290
+ for the public evaluation protocol, training notes, model card, and negative
291
+ results write-up.
292
+
293
+ Note: v0.40-v0.51 are internal evaluation-iteration identifiers for hardening
294
+ experiments, not packaged PyPI releases. The installable package version is
295
+ 0.4.0 and the public base checkpoint is unchanged.
296
+
297
+ Important scope notes:
298
+
299
+ - The public checkpoint is trained on LoCoMo-style data; LoCoMo is in-domain
300
+ for this checkpoint.
301
+ - The headline value proposition is cost-effective learned reranking for memory
302
+ tasks, plus a rigorous negative result about mechanism attribution.
303
+ - Stronger rerankers matter. ConvMemory beats BGE-reranker-base/large on
304
+ LoCoMo Recall@10, but it loses to `mxbai-rerank-large-v1` on both Recall@10
305
+ and MRR.
306
+ - v0.50/v0.51 refute the stronger claim that temporal structure is the reason
307
+ ConvMemory works. The learned temporal window contributes statistically, but
308
+ its benefit is not temporally specific.
309
+ - External OOD results are mixed. The MuSiQue negative result is reported below
310
+ because ConvMemory is not intended as a broad multi-hop document reranker.
311
+ - Latency numbers assume memory embeddings and memory-side indexes are already
312
+ available. Cross-encoder timing includes pairwise scoring through the tested
313
+ `CrossEncoder` path.
314
+
315
+ ### LongMemEval Cost Advantage
316
+
317
+ This is the strongest practical story for ConvMemory: on memory-family tasks it
318
+ offers a much cheaper reranking stage while remaining recall-competitive.
319
+
320
+ | Setting | Method | Recall@10 | MRR | ms/query |
321
+ |---|---|---:|---:|---:|
322
+ | Clean500, BGE-large CE | Raw MPNet | 0.9049 | 0.7829 | 0.01 |
323
+ | Clean500, BGE-large CE | BGE-large CE top500 | 0.8807 | 0.8574 | 555.69 |
324
+ | Clean500, BGE-large CE | ConvMemory top500 | 0.9593 | 0.8973 | 44.00 |
325
+ | Clean500, mxbai CE | Raw MPNet | 0.9049 | 0.7829 | 0.01 |
326
+ | Clean500, mxbai CE | mxbai CE top500 | 0.9835 | 0.9317 | 1129.14 |
327
+ | Clean500, mxbai CE | ConvMemory top500 | 0.9593 | 0.8973 | 40.80 |
328
+ | Stress1000 seed23, BGE-large CE | Raw MPNet | 0.5452 | 0.4561 | 0.13 |
329
+ | Stress1000 seed23, BGE-large CE | BGE-large CE top500 | 0.6913 | 0.6651 | 5231.77 |
330
+ | Stress1000 seed23, BGE-large CE | ConvMemory candidate-local | 0.7386 | 0.6125 | 110.71 |
331
+ | Stress1000 seed23, mxbai CE | Raw MPNet | 0.5452 | 0.4561 | 0.12 |
332
+ | Stress1000 seed23, mxbai CE | mxbai CE top500 | 0.8195 | 0.7044 | 11211.63 |
333
+ | Stress1000 seed23, mxbai CE | ConvMemory candidate-local | 0.7386 | 0.6125 | 95.57 |
334
+
335
+ LongMemEval numbers are not seed-averaged: Clean500 is a single run and Stress1000 is reported for a single seed (23). Read these as indicative single-run retrieval-stage checks, not benchmark-grade comparisons.
336
+
337
+ Reading: ConvMemory reranks above BGE-large CE on these memory-family Recall@10
338
+ checks while being about 12-47x faster. It remains below mxbai accuracy, but is
339
+ about 28-117x lower latency in the tested settings.
340
+
341
+ ### LoCoMo Strong Cross-Encoder Baselines
342
+
343
+ Five split seeds: 7, 11, 23, 31, 47. Candidate pool: raw dense top500.
344
+
345
+ | Reranker | Recall@10 | Hit@10 | MRR |
346
+ |---|---:|---:|---:|
347
+ | ConvMemory (v0.40 5-seed) | 0.7798 +/- 0.0074 | not reported | 0.5824 |
348
+ | BGE-reranker-base | 0.6967 +/- 0.0126 | 0.7469 +/- 0.0144 | 0.5469 +/- 0.0140 |
349
+ | Jina-reranker-v2-base-multilingual | 0.7411 +/- 0.0103 | 0.7924 +/- 0.0083 | 0.5754 +/- 0.0074 |
350
+ | BGE-reranker-large | 0.7621 +/- 0.0155 | 0.8124 +/- 0.0135 | 0.6120 +/- 0.0144 |
351
+ | mxbai-rerank-large-v1 | 0.8080 +/- 0.0153 | 0.8486 +/- 0.0108 | 0.6687 +/- 0.0093 |
352
+
353
+ Reading: ConvMemory is competitive on recall, but it should not be given an
354
+ overall cross-encoder superiority claim. `mxbai-rerank-large-v1` is stronger on
355
+ LoCoMo Recall@10 and MRR.
356
+
357
+ ### Retrained Ablation
358
+
359
+ Three split seeds, MPNet. These are retrained ablations, not inference-time
360
+ feature masks.
361
+
362
+ | Variant | Recall@10 | MRR | Delta R@10 vs full |
363
+ |---|---:|---:|---:|
364
+ | full_control | 0.7474 +/- 0.0229 | 0.5343 +/- 0.0160 | 0.0000 |
365
+ | no_router | 0.7491 +/- 0.0213 | 0.5391 +/- 0.0137 | +0.0017 +/- 0.0020 |
366
+ | no_temporal_w1 | 0.7121 +/- 0.0232 | 0.5305 +/- 0.0148 | -0.0353 +/- 0.0052 |
367
+ | no_lexical | 0.6584 +/- 0.0185 | 0.4367 +/- 0.0129 | -0.0890 +/- 0.0061 |
368
+ | no_lexical_no_router | 0.6574 +/- 0.0163 | 0.4342 +/- 0.0127 | -0.0899 +/- 0.0087 |
369
+
370
+ Reading: lexical interaction features are the largest contributor. The
371
+ no-temporal variant is weaker than the full model in this three-seed ablation,
372
+ but this table alone does not prove that the gain is temporally specific. The
373
+ router/DCA scalar contributes approximately zero; removing it is neutral to
374
+ slightly positive, so it is treated as an experimental negative result rather
375
+ than a model feature.
376
+
377
+ ### Attribution / Negative Result
378
+
379
+ The v0.50/v0.51 follow-up was designed to test whether the temporal window is
380
+ the load-bearing reason ConvMemory works. This section uses the retrained
381
+ attribution pipeline, not the v0.40 headline pipeline above.
382
+
383
+ Five split seeds: 7, 11, 23, 31, 47.
384
+
385
+ | Method | Recall@10 |
386
+ |---|---:|
387
+ | full_control_retrained | 0.7432 +/- 0.0207 |
388
+ | no_temporal_w1_retrained | 0.7054 +/- 0.0221 |
389
+ | tuned_heuristic | 0.7234 +/- 0.0227 |
390
+ | raw_dense | 0.5345 +/- 0.0210 |
391
+
392
+ Paired bootstrap, `full_control_retrained - no_temporal_w1_retrained`,
393
+ Recall@10:
394
+
395
+ | Slice | Delta | 95% CI | Reading |
396
+ |---|---:|---:|---|
397
+ | ALL | +0.0376 | [+0.0306, +0.0451] | significant |
398
+ | T_SUP_auto | +0.0407 | [+0.0219, +0.0603] | significant, open question |
399
+ | T_REQUIRED_auto | +0.0252 | [+0.0139, +0.0363] | significant |
400
+ | T_HOP_auto | +0.0096 | [-0.0037, +0.0230] | not significant |
401
+ | OTHER | +0.0868 | [+0.0672, +0.1045] | significant |
402
+ | HARD_NON_TEMPORAL_auto | +0.0838 | [+0.0650, +0.1040] | significant |
403
+
404
+ The honest reading is negative for the original temporal-mechanism thesis: the
405
+ learned temporal window contributes on aggregate, but its benefit is largest on
406
+ hard non-temporal controls (`OTHER` and `HARD_NON_TEMPORAL_auto`) and is not
407
+ statistically significant on the most temporal multi-hop proxy (`T_HOP_auto`).
408
+ This looks more like generic neighborhood/capacity smoothing than proven
409
+ temporal-structure exploitation. `T_SUP_auto` remains the only notable open
410
+ question, but it is still smaller than the hard non-temporal control effect and
411
+ should not be used as a load-bearing temporal claim.
412
+
413
+ Against the tuned heuristic, the same retrained attribution pipeline gives
414
+ `full_control_retrained` a Recall@10 delta of +0.0199 with 95% CI
415
+ [+0.0105, +0.0283], and an MRR delta of +0.0566. So the learned reranker still
416
+ adds value; the negative result is about why it works.
417
+
418
+ See [docs/NEGATIVE_RESULTS.md](docs/NEGATIVE_RESULTS.md) for the full
419
+ v0.50/v0.51 interpretation.
420
+
421
+ For the later current/stale and conflict-editor research trajectory, see
422
+ [docs/RESEARCH_TRAJECTORY.md](docs/RESEARCH_TRAJECTORY.md). That document
423
+ summarizes the internal v0.60+ research line in a public-safe form without raw
424
+ logs, private paths, caches, or exploratory scripts.
425
+
426
+ ### Strong-Backbone Retraining
427
+
428
+ Three split seeds. ConvMemory is retrained in each embedding space.
429
+
430
+ | Backbone | Raw Recall@10 | ConvMemory Recall@10 | Gain | ConvMemory MRR |
431
+ |---|---:|---:|---:|---:|
432
+ | BGE-large | 0.6680 +/- 0.0237 | 0.7726 +/- 0.0100 | +0.1046 +/- 0.0137 | 0.5639 +/- 0.0066 |
433
+ | E5-large | 0.7010 +/- 0.0216 | 0.7902 +/- 0.0171 | +0.0892 +/- 0.0052 | 0.5941 +/- 0.0103 |
434
+
435
+ Reading: ConvMemory gains are not just an artifact of a weak MPNet retriever.
436
+ Retraining on stronger embeddings still gives about +9 to +10 Recall@10 points.
437
+
438
+ ### External OOD Results
439
+
440
+ Single run per dataset. These are intentionally reported as mixed evidence.
441
+
442
+ | Dataset | Questions | ConvMemory R@10 | Raw dense | Dense + lexical | BM25 |
443
+ |---|---:|---:|---:|---:|---:|
444
+ | QMSum | 272 | 0.5882 | 0.4724 | 0.5423 | 0.5294 |
445
+ | MSC persona | 6155 | 0.9632 | 0.8375 | 0.9765 | 0.9920 |
446
+ | HotpotQA | 1000 | 0.7983 | 0.7682 | 0.8621 | 0.8280 |
447
+ | MuSiQue | 1000 | 0.7635 | 0.8640 | 0.8175 | 0.7245 |
448
+
449
+ These external OOD results are single runs without seed averaging or confidence intervals; treat them as indicative scope checks, not benchmark-grade comparisons.
450
+
451
+ Reading: ConvMemory wins on QMSum and improves strongly over raw dense on MSC,
452
+ but lexical/BM25 baselines dominate MSC's weak persona-overlap labels. On
453
+ HotpotQA, a trivial dense+lexical baseline is stronger. On MuSiQue, ConvMemory
454
+ regresses below raw dense. This is a scope boundary: ConvMemory is a memory
455
+ reranker, not a general multi-hop document reranker.
456
+
457
+ ### Where ConvMemory Fails
458
+
459
+ - Non-temporal multi-hop retrieval: MuSiQue is negative against raw dense.
460
+ - Lexically anchored document retrieval: HotpotQA favors dense+lexical scoring.
461
+ - Maximum top-rank precision: mxbai-rerank-large remains stronger on LoCoMo MRR.
462
+ - Cross-query score calibration: scores should not be treated as calibrated
463
+ confidence without application-specific validation.
464
+ - Mechanism attribution: v0.51 does not support temporal structure as the
465
+ load-bearing explanation for ConvMemory's gain.
466
+
467
+ ## Reproducibility
468
+
469
+ The current documentation reports the hardened v0.47/v0.51 audit. See:
470
+
471
+ - [docs/EVALUATION_PROTOCOL.md](docs/EVALUATION_PROTOCOL.md)
472
+ - [docs/MODEL_CARD.md](docs/MODEL_CARD.md)
473
+ - [docs/TRAINING.md](docs/TRAINING.md)
474
+
475
+ Main evaluation artifacts are kept outside the repository history. Large
476
+ per-question CSV files, teacher caches, and checkpoints are intentionally not
477
+ committed.
478
+
479
+ ## Project Status
480
+
481
+ Stable public API:
482
+
483
+ - `ConvMemory.from_pretrained`
484
+ - `ConvMemory.rerank`
485
+ - `ConvMemory.retrieve`
486
+ - `ConvMemory.expand_context`
487
+ - `ConvMemory.rerank_embeddings`
488
+
489
+ Public alpha API:
490
+
491
+ - `ConvMemory.attach_ccge_editor`
492
+ - `ConvMemory.load_ccge_editor`
493
+ - `CCGELowAmplitudeEditor`
494
+ - `build_ccge_features`
495
+
496
+ Research-preview code:
497
+
498
+ - context expansion policies for wider agent memory budgets;
499
+ - cascade fusion with cross-encoder scoring;
500
+ - stronger cross-encoder comparison scripts;
501
+ - generic JSONL adapters for external memory-retrieval datasets.
502
+
503
+ Not included in the public package:
504
+
505
+ - raw datasets, checkpoints, embedding caches, teacher caches, and full
506
+ per-question result CSVs;
507
+ - local experiment logs and remote execution archives;
508
+ - exploratory numbered experiment prototypes unless they are explicitly promoted
509
+ into the documented public API.
510
+
511
+ CCGE-LA is packaged as a public alpha API for conflict-aware ConvMemory editing.
512
+ It should still be treated as experimental until a public training recipe and
513
+ same-split public evaluation command are released.
514
+
515
+ ## License
516
+
517
+ MIT