wavemind 2.2.0__tar.gz → 2.2.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (160) hide show
  1. wavemind-2.2.2/Dockerfile +28 -0
  2. {wavemind-2.2.0 → wavemind-2.2.2}/MANIFEST.in +5 -0
  3. {wavemind-2.2.0 → wavemind-2.2.2}/PKG-INFO +316 -32
  4. {wavemind-2.2.0 → wavemind-2.2.2}/README.md +315 -31
  5. wavemind-2.2.2/benchmarks/BENCHMARK_LEADERBOARD.md +32 -0
  6. {wavemind-2.2.0 → wavemind-2.2.2}/benchmarks/BENCHMARK_REPORT.md +11 -4
  7. {wavemind-2.2.0 → wavemind-2.2.2}/benchmarks/ann_index_curve_benchmark.py +8 -6
  8. {wavemind-2.2.0 → wavemind-2.2.2}/benchmarks/benchmark_matrix_results.json +375 -13
  9. {wavemind-2.2.0 → wavemind-2.2.2}/benchmarks/benchmark_registry.py +300 -28
  10. {wavemind-2.2.0 → wavemind-2.2.2}/benchmarks/field_memory_dynamics_benchmark.py +28 -1
  11. {wavemind-2.2.0 → wavemind-2.2.2}/benchmarks/field_memory_dynamics_results.json +5 -3
  12. wavemind-2.2.2/benchmarks/longmemeval_answer_benchmark.py +599 -0
  13. {wavemind-2.2.0 → wavemind-2.2.2}/benchmarks/longmemeval_answer_extractive_20_results.json +91 -4
  14. wavemind-2.2.2/benchmarks/longmemeval_answer_qwen25_0_5b_50_results.json +344 -0
  15. wavemind-2.2.2/benchmarks/longmemeval_answer_qwen25_1_5b_50_results.json +344 -0
  16. {wavemind-2.2.0 → wavemind-2.2.2}/benchmarks/longmemeval_evidence_50_results.json +10 -10
  17. {wavemind-2.2.0 → wavemind-2.2.2}/benchmarks/longmemeval_memory_benchmark.py +2 -1
  18. wavemind-2.2.2/benchmarks/memory_competitor_benchmark.py +244 -0
  19. wavemind-2.2.2/benchmarks/memory_competitor_results.json +41 -0
  20. wavemind-2.2.2/benchmarks/nomiracl_russian_benchmark.py +242 -0
  21. wavemind-2.2.2/benchmarks/nomiracl_russian_results.json +53 -0
  22. wavemind-2.2.2/benchmarks/production_index_profile_results.json +83 -0
  23. wavemind-2.2.2/benchmarks/production_load_benchmark.py +164 -0
  24. wavemind-2.2.2/benchmarks/production_load_qdrant_1m_results.json +66 -0
  25. wavemind-2.2.2/benchmarks/production_load_results.json +79 -0
  26. {wavemind-2.2.0 → wavemind-2.2.2}/benchmarks/render_benchmark_charts.py +25 -18
  27. wavemind-2.2.2/benchmarks/render_benchmark_leaderboard.py +234 -0
  28. {wavemind-2.2.0 → wavemind-2.2.2}/benchmarks/render_benchmark_report.py +4 -0
  29. wavemind-2.2.2/benchmarks/scale_readiness_benchmark.py +266 -0
  30. wavemind-2.2.2/benchmarks/scale_readiness_results.json +49 -0
  31. {wavemind-2.2.0 → wavemind-2.2.2}/benchmarks/wavemind_capacity_results.json +1 -1
  32. {wavemind-2.2.0 → wavemind-2.2.2}/docker-compose.yml +1 -1
  33. wavemind-2.2.2/docs/BENCHMARK_BRIEF.md +248 -0
  34. {wavemind-2.2.0 → wavemind-2.2.2}/docs/CHROMA_MIGRATION.md +16 -0
  35. {wavemind-2.2.0 → wavemind-2.2.2}/docs/LAUNCH_KIT.md +21 -6
  36. wavemind-2.2.2/docs/OBSERVABILITY.md +197 -0
  37. {wavemind-2.2.0 → wavemind-2.2.2}/docs/ROADMAP.md +51 -11
  38. {wavemind-2.2.0 → wavemind-2.2.2}/docs/RU_LAUNCH_POSTS.md +26 -43
  39. {wavemind-2.2.0 → wavemind-2.2.2}/docs/USE_CASES.md +25 -0
  40. {wavemind-2.2.0 → wavemind-2.2.2}/docs/assets/benchmark-summary.svg +60 -48
  41. wavemind-2.2.2/docs/assets/wavemind-demo.gif +0 -0
  42. wavemind-2.2.2/examples/chroma_migration.py +172 -0
  43. wavemind-2.2.2/examples/customer_support_memory.py +214 -0
  44. wavemind-2.2.2/examples/llamaindex_retriever.py +88 -0
  45. wavemind-2.2.2/examples/observability/README.md +18 -0
  46. wavemind-2.2.2/examples/observability/docker-compose.yml +44 -0
  47. wavemind-2.2.2/examples/observability/otel-collector.yaml +22 -0
  48. wavemind-2.2.2/examples/observability/prometheus-alerts.yml +39 -0
  49. wavemind-2.2.2/examples/observability/prometheus.yml +16 -0
  50. wavemind-2.2.2/examples/production-index-profile/README.md +39 -0
  51. wavemind-2.2.2/examples/production-index-profile/docker-compose.yml +64 -0
  52. wavemind-2.2.2/examples/research_notebook_memory.py +220 -0
  53. {wavemind-2.2.0 → wavemind-2.2.2}/pyproject.toml +1 -1
  54. {wavemind-2.2.0 → wavemind-2.2.2}/requirements-optional.txt +4 -0
  55. {wavemind-2.2.0 → wavemind-2.2.2}/tests/test_api.py +133 -0
  56. wavemind-2.2.2/tests/test_benchmark_brief.py +35 -0
  57. wavemind-2.2.2/tests/test_benchmark_leaderboard.py +39 -0
  58. {wavemind-2.2.0 → wavemind-2.2.2}/tests/test_benchmark_registry.py +8 -1
  59. {wavemind-2.2.0 → wavemind-2.2.2}/tests/test_benchmark_report.py +2 -0
  60. wavemind-2.2.2/tests/test_chroma_migration_example.py +92 -0
  61. {wavemind-2.2.0 → wavemind-2.2.2}/tests/test_cli_smoke.py +54 -0
  62. wavemind-2.2.2/tests/test_cluster.py +100 -0
  63. wavemind-2.2.2/tests/test_examples.py +259 -0
  64. {wavemind-2.2.0 → wavemind-2.2.2}/tests/test_field_graph_integration.py +42 -0
  65. {wavemind-2.2.0 → wavemind-2.2.2}/tests/test_field_memory_dynamics_benchmark.py +2 -0
  66. {wavemind-2.2.0 → wavemind-2.2.2}/tests/test_indexes_encoders.py +25 -0
  67. wavemind-2.2.2/tests/test_jobs.py +56 -0
  68. {wavemind-2.2.0 → wavemind-2.2.2}/tests/test_longmemeval_answer_benchmark.py +53 -1
  69. wavemind-2.2.2/tests/test_memory_competitor_benchmark.py +50 -0
  70. wavemind-2.2.2/tests/test_multimodal.py +71 -0
  71. wavemind-2.2.2/tests/test_nomiracl_russian_benchmark.py +73 -0
  72. {wavemind-2.2.0 → wavemind-2.2.2}/tests/test_observability.py +42 -0
  73. wavemind-2.2.2/tests/test_observability_docs.py +30 -0
  74. {wavemind-2.2.0 → wavemind-2.2.2}/tests/test_packaging_files.py +42 -1
  75. wavemind-2.2.2/tests/test_production_index_profile.py +47 -0
  76. wavemind-2.2.2/tests/test_production_load_benchmark.py +73 -0
  77. wavemind-2.2.2/tests/test_scale_plan.py +171 -0
  78. wavemind-2.2.2/tests/test_scale_readiness_benchmark.py +18 -0
  79. {wavemind-2.2.0 → wavemind-2.2.2}/wavemind/__init__.py +28 -1
  80. {wavemind-2.2.0 → wavemind-2.2.2}/wavemind/api.py +222 -37
  81. {wavemind-2.2.0 → wavemind-2.2.2}/wavemind/cli.py +148 -0
  82. wavemind-2.2.2/wavemind/cluster.py +215 -0
  83. {wavemind-2.2.0 → wavemind-2.2.2}/wavemind/core.py +184 -0
  84. {wavemind-2.2.0 → wavemind-2.2.2}/wavemind/indexes.py +67 -24
  85. wavemind-2.2.2/wavemind/jobs.py +216 -0
  86. wavemind-2.2.2/wavemind/multimodal.py +147 -0
  87. wavemind-2.2.2/wavemind/scale.py +152 -0
  88. {wavemind-2.2.0 → wavemind-2.2.2}/wavemind/studio.py +7 -3
  89. {wavemind-2.2.0 → wavemind-2.2.2}/wavemind.egg-info/PKG-INFO +316 -32
  90. {wavemind-2.2.0 → wavemind-2.2.2}/wavemind.egg-info/SOURCES.txt +45 -0
  91. wavemind-2.2.0/Dockerfile +0 -23
  92. wavemind-2.2.0/benchmarks/longmemeval_answer_benchmark.py +0 -351
  93. wavemind-2.2.0/tests/test_examples.py +0 -132
  94. {wavemind-2.2.0 → wavemind-2.2.2}/CONTRIBUTING.md +0 -0
  95. {wavemind-2.2.0 → wavemind-2.2.2}/LICENSE +0 -0
  96. {wavemind-2.2.0 → wavemind-2.2.2}/SECURITY.md +0 -0
  97. {wavemind-2.2.0 → wavemind-2.2.2}/SUPPORT.md +0 -0
  98. {wavemind-2.2.0 → wavemind-2.2.2}/benchmarks/agent_memory_benchmark.py +0 -0
  99. {wavemind-2.2.0 → wavemind-2.2.2}/benchmarks/agent_memory_results.json +0 -0
  100. {wavemind-2.2.0 → wavemind-2.2.2}/benchmarks/ann_index_curve_results.json +0 -0
  101. {wavemind-2.2.0 → wavemind-2.2.2}/benchmarks/dynamic_memory_benchmark.py +0 -0
  102. {wavemind-2.2.0 → wavemind-2.2.2}/benchmarks/dynamic_memory_results.json +0 -0
  103. {wavemind-2.2.0 → wavemind-2.2.2}/benchmarks/locomo_evidence_results.json +0 -0
  104. {wavemind-2.2.0 → wavemind-2.2.2}/benchmarks/locomo_memory_benchmark.py +0 -0
  105. {wavemind-2.2.0 → wavemind-2.2.2}/benchmarks/locomo_sentence_evidence_results.json +0 -0
  106. {wavemind-2.2.0 → wavemind-2.2.2}/benchmarks/long_memory_evidence_benchmark.py +0 -0
  107. {wavemind-2.2.0 → wavemind-2.2.2}/benchmarks/long_memory_evidence_results.json +0 -0
  108. {wavemind-2.2.0 → wavemind-2.2.2}/benchmarks/longmemeval_evidence_results.json +0 -0
  109. {wavemind-2.2.0 → wavemind-2.2.2}/benchmarks/open_retrieval_benchmark.py +0 -0
  110. {wavemind-2.2.0 → wavemind-2.2.2}/benchmarks/open_retrieval_scifact_results.json +0 -0
  111. {wavemind-2.2.0 → wavemind-2.2.2}/benchmarks/ru_sentences_benchmark.py +0 -0
  112. {wavemind-2.2.0 → wavemind-2.2.2}/docs/DEMO_SCRIPT.md +0 -0
  113. {wavemind-2.2.0 → wavemind-2.2.2}/docs/PROJECT_BOARD.md +0 -0
  114. {wavemind-2.2.0 → wavemind-2.2.2}/docs/RELEASE.md +0 -0
  115. {wavemind-2.2.0 → wavemind-2.2.2}/docs/assets/wavemind-social-card.svg +0 -0
  116. {wavemind-2.2.0 → wavemind-2.2.2}/examples/agent_with_memory.py +0 -0
  117. {wavemind-2.2.0 → wavemind-2.2.2}/examples/demo.py +0 -0
  118. {wavemind-2.2.0 → wavemind-2.2.2}/examples/dynamic_memory_demo.py +0 -0
  119. {wavemind-2.2.0 → wavemind-2.2.2}/examples/framework_integrations.py +0 -0
  120. {wavemind-2.2.0 → wavemind-2.2.2}/examples/langchain_memory.py +0 -0
  121. {wavemind-2.2.0 → wavemind-2.2.2}/examples/sharded_memory.py +0 -0
  122. {wavemind-2.2.0 → wavemind-2.2.2}/install.bat +0 -0
  123. {wavemind-2.2.0 → wavemind-2.2.2}/install.sh +0 -0
  124. {wavemind-2.2.0 → wavemind-2.2.2}/requirements.txt +0 -0
  125. {wavemind-2.2.0 → wavemind-2.2.2}/setup.cfg +0 -0
  126. {wavemind-2.2.0 → wavemind-2.2.2}/tests/test_agent_memory_benchmark.py +0 -0
  127. {wavemind-2.2.0 → wavemind-2.2.2}/tests/test_ann_index_curve_benchmark.py +0 -0
  128. {wavemind-2.2.0 → wavemind-2.2.2}/tests/test_api_process_persistence.py +0 -0
  129. {wavemind-2.2.0 → wavemind-2.2.2}/tests/test_benchmark_charts.py +0 -0
  130. {wavemind-2.2.0 → wavemind-2.2.2}/tests/test_core_persistence.py +0 -0
  131. {wavemind-2.2.0 → wavemind-2.2.2}/tests/test_dynamic_memory_benchmark.py +0 -0
  132. {wavemind-2.2.0 → wavemind-2.2.2}/tests/test_field_graph.py +0 -0
  133. {wavemind-2.2.0 → wavemind-2.2.2}/tests/test_framework_adapters.py +0 -0
  134. {wavemind-2.2.0 → wavemind-2.2.2}/tests/test_import_benchmark.py +0 -0
  135. {wavemind-2.2.0 → wavemind-2.2.2}/tests/test_langchain_integration.py +0 -0
  136. {wavemind-2.2.0 → wavemind-2.2.2}/tests/test_locomo_memory_benchmark.py +0 -0
  137. {wavemind-2.2.0 → wavemind-2.2.2}/tests/test_long_memory_evidence_benchmark.py +0 -0
  138. {wavemind-2.2.0 → wavemind-2.2.2}/tests/test_longmemeval_memory_benchmark.py +0 -0
  139. {wavemind-2.2.0 → wavemind-2.2.2}/tests/test_open_retrieval_benchmark.py +0 -0
  140. {wavemind-2.2.0 → wavemind-2.2.2}/tests/test_postgres_storage.py +0 -0
  141. {wavemind-2.2.0 → wavemind-2.2.2}/tests/test_semantic_and_latency.py +0 -0
  142. {wavemind-2.2.0 → wavemind-2.2.2}/tests/test_sharding.py +0 -0
  143. {wavemind-2.2.0 → wavemind-2.2.2}/wavemind/__main__.py +0 -0
  144. {wavemind-2.2.0 → wavemind-2.2.2}/wavemind/benchmark.py +0 -0
  145. {wavemind-2.2.0 → wavemind-2.2.2}/wavemind/encoders.py +0 -0
  146. {wavemind-2.2.0 → wavemind-2.2.2}/wavemind/field_graph.py +0 -0
  147. {wavemind-2.2.0 → wavemind-2.2.2}/wavemind/importers.py +0 -0
  148. {wavemind-2.2.0 → wavemind-2.2.2}/wavemind/integrations/__init__.py +0 -0
  149. {wavemind-2.2.0 → wavemind-2.2.2}/wavemind/integrations/autogen.py +0 -0
  150. {wavemind-2.2.0 → wavemind-2.2.2}/wavemind/integrations/crewai.py +0 -0
  151. {wavemind-2.2.0 → wavemind-2.2.2}/wavemind/integrations/langchain.py +0 -0
  152. {wavemind-2.2.0 → wavemind-2.2.2}/wavemind/integrations/langgraph.py +0 -0
  153. {wavemind-2.2.0 → wavemind-2.2.2}/wavemind/integrations/llamaindex.py +0 -0
  154. {wavemind-2.2.0 → wavemind-2.2.2}/wavemind/observability.py +0 -0
  155. {wavemind-2.2.0 → wavemind-2.2.2}/wavemind/sharding.py +0 -0
  156. {wavemind-2.2.0 → wavemind-2.2.2}/wavemind/storage.py +0 -0
  157. {wavemind-2.2.0 → wavemind-2.2.2}/wavemind.egg-info/dependency_links.txt +0 -0
  158. {wavemind-2.2.0 → wavemind-2.2.2}/wavemind.egg-info/entry_points.txt +0 -0
  159. {wavemind-2.2.0 → wavemind-2.2.2}/wavemind.egg-info/requires.txt +0 -0
  160. {wavemind-2.2.0 → wavemind-2.2.2}/wavemind.egg-info/top_level.txt +0 -0
@@ -0,0 +1,28 @@
1
+ FROM python:3.11-slim
2
+
3
+ ARG INSTALL_OPTIONAL=false
4
+ ARG INSTALL_OTEL=false
5
+ ARG INSTALL_PRODUCTION=false
6
+
7
+ ENV PYTHONDONTWRITEBYTECODE=1
8
+ ENV PYTHONUNBUFFERED=1
9
+ ENV WAVEMIND_DB=/data/wavemind.sqlite3
10
+ ENV WAVEMIND_LOG_LEVEL=INFO
11
+
12
+ WORKDIR /app
13
+
14
+ RUN if [ "$INSTALL_OPTIONAL" = "true" ] || [ "$INSTALL_PRODUCTION" = "true" ]; then apt-get update && apt-get install -y --no-install-recommends build-essential && rm -rf /var/lib/apt/lists/*; fi
15
+
16
+ COPY README.md pyproject.toml requirements.txt requirements-optional.txt ./
17
+ RUN pip install --no-cache-dir -r requirements.txt \
18
+ && if [ "$INSTALL_OPTIONAL" = "true" ]; then pip install --no-cache-dir -r requirements-optional.txt; fi \
19
+ && if [ "$INSTALL_OTEL" = "true" ]; then pip install --no-cache-dir "opentelemetry-api>=1.25" "opentelemetry-sdk>=1.25" "opentelemetry-exporter-otlp>=1.25" "opentelemetry-instrumentation-fastapi>=0.46b0"; fi
20
+
21
+ COPY wavemind ./wavemind
22
+ COPY wavemind_v2.py ./wavemind_v2.py
23
+ RUN if [ "$INSTALL_PRODUCTION" = "true" ]; then pip install --no-cache-dir -e ".[production]"; else pip install --no-cache-dir -e .; fi
24
+
25
+ VOLUME ["/data", "/backups"]
26
+ EXPOSE 8000
27
+
28
+ CMD ["uvicorn", "wavemind.api:create_app", "--factory", "--host", "0.0.0.0", "--port", "8000"]
@@ -14,13 +14,18 @@ include docs/RELEASE.md
14
14
  include docs/PROJECT_BOARD.md
15
15
  include docs/DEMO_SCRIPT.md
16
16
  include docs/LAUNCH_KIT.md
17
+ include docs/BENCHMARK_BRIEF.md
17
18
  include docs/CHROMA_MIGRATION.md
19
+ include docs/OBSERVABILITY.md
18
20
  include docs/RU_LAUNCH_POSTS.md
19
21
  include docs/USE_CASES.md
20
22
  include docs/assets/benchmark-summary.svg
21
23
  include docs/assets/wavemind-social-card.svg
24
+ include docs/assets/wavemind-demo.gif
22
25
  include benchmarks/*.py
23
26
  include benchmarks/*.json
24
27
  include benchmarks/*.md
25
28
  include examples/*.py
29
+ recursive-include examples/observability *
30
+ recursive-include examples/production-index-profile *
26
31
  prune benchmarks/data
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: wavemind
3
- Version: 2.2.0
3
+ Version: 2.2.2
4
4
  Summary: Local-first dynamic memory field with vector search and wave-field re-ranking
5
5
  License-Expression: MIT
6
6
  Project-URL: Homepage, https://github.com/CaspianG/wavemind
@@ -66,6 +66,8 @@ users or projects isolated.
66
66
 
67
67
  <img src="https://raw.githubusercontent.com/CaspianG/wavemind/main/docs/assets/wavemind-social-card.svg" alt="WaveMind dynamic memory overview" width="820">
68
68
 
69
+ <img src="https://raw.githubusercontent.com/CaspianG/wavemind/main/docs/assets/wavemind-demo.gif" alt="WaveMind dynamic memory terminal demo" width="820">
70
+
69
71
  [Quick Start](#quick-start) |
70
72
  [CLI](#cli-cheat-sheet) |
71
73
  [Studio](#wavemind-studio) |
@@ -77,6 +79,8 @@ users or projects isolated.
77
79
  [Use Cases](docs/USE_CASES.md) |
78
80
  [HTTP API](#http-api) |
79
81
  [Benchmarks](#benchmark) |
82
+ [Benchmark Brief](docs/BENCHMARK_BRIEF.md) |
83
+ [Research Branches](#research-branches) |
80
84
  [Roadmap](#roadmap) |
81
85
  [Contributing](#contributing) |
82
86
  [Limitations](#known-limitations)
@@ -159,6 +163,7 @@ Start here if you only want to use WaveMind from the terminal:
159
163
  | Show first-run help | `wavemind quickstart` |
160
164
  | Store a memory | `wavemind remember "Andrey prefers short answers" --namespace user:42` |
161
165
  | Search memory | `wavemind query "answer style" --namespace user:42` |
166
+ | Consolidate active patterns | `wavemind consolidate --namespace user:42 --seed "Rust compiler systems"` |
162
167
  | Open local dashboard | `wavemind studio` |
163
168
  | See stored state | `wavemind stats --namespace user:42` |
164
169
  | Delete a namespace | `wavemind forget --namespace user:42` |
@@ -272,11 +277,12 @@ wavemind --db ./state/app_memory.sqlite3 query "answer style" --namespace user:4
272
277
  | CrewAI or AutoGen loop | The adapters in `wavemind.integrations` |
273
278
  | Node, Go, Ruby, PHP, or no-code app | `wavemind serve` and the HTTP API |
274
279
  | Personal knowledge base | Store notes by project namespace and query locally |
275
- | Support or CRM workflow | Store customer issues, resolutions, preferences, and corrections |
276
- | Research or trading notebook | Store observations with source metadata and TTL for temporary hypotheses |
280
+ | Support or CRM workflow | Customer issues, resolutions, preferences, corrections, TTL, and namespace isolation. See [`examples/customer_support_memory.py`](examples/customer_support_memory.py). |
281
+ | Research or analyst notebook | Findings, hypotheses, decisions, source metadata, TTL, and project isolation. See [`examples/research_notebook_memory.py`](examples/research_notebook_memory.py). |
277
282
 
278
283
  For migrations from existing local vector memory, start with
279
- [`docs/CHROMA_MIGRATION.md`](docs/CHROMA_MIGRATION.md).
284
+ [`docs/CHROMA_MIGRATION.md`](docs/CHROMA_MIGRATION.md). The guide has a tested
285
+ offline fixture at [`examples/chroma_migration.py`](examples/chroma_migration.py).
280
286
 
281
287
  ## Minimal Agent Loop
282
288
 
@@ -322,6 +328,24 @@ python examples/dynamic_memory_demo.py
322
328
  That demo shows corrected facts outranking stale facts, temporary memory
323
329
  expiring, namespace isolation, and index-health reporting.
324
330
 
331
+ To see the same behavior in a practical support/CRM workflow:
332
+
333
+ ```sh
334
+ python examples/customer_support_memory.py
335
+ ```
336
+
337
+ That demo stores customer preferences, billing tickets, stale CRM data,
338
+ temporary discount codes, and separate customer namespaces.
339
+
340
+ To see source-aware research memory:
341
+
342
+ ```sh
343
+ python examples/research_notebook_memory.py
344
+ ```
345
+
346
+ That demo stores analyst findings, temporary hypotheses, decisions, source
347
+ metadata, and isolated project namespaces.
348
+
325
349
  ## How The Memory Field Works
326
350
 
327
351
  ```mermaid
@@ -335,6 +359,8 @@ flowchart LR
335
359
  R --> P["app, search UI, prompt, API, or tool"]
336
360
  P --> F["recall feedback updates hotness / priority"]
337
361
  F --> D
362
+ F --> C["consolidate active clusters"]
363
+ C --> D
338
364
  ```
339
365
 
340
366
  The wave field is the dynamic layer around stored memories. It is not a
@@ -350,12 +376,19 @@ memories should still matter.
350
376
  | TTL | This fact is temporary. | Drops out after expiry. |
351
377
  | namespace and tags | This belongs to one user/project/type. | Prevents cross-user or cross-topic leakage. |
352
378
  | graph dynamics | Related memories can excite or inhibit each other. | Helps clusters and corrections behave like memory, not a flat list. |
379
+ | consolidation | Active clusters can become durable concept memories. | Turns repeated patterns into inspectable higher-level memories with provenance. |
353
380
 
354
381
  Technically, the current `MemoryFieldGraph` is a discrete graph over stored
355
382
  memories, not a continuous mathematical physics field. That honesty matters:
356
383
  WaveMind is useful today as a dynamic memory engine, while the research path is
357
384
  to make the field dynamics more explicit, measurable, and scalable.
358
385
 
386
+ Self-organization is now part of the core surface. `consolidate_concepts()`,
387
+ `wavemind consolidate`, and `POST /consolidate` can turn an active graph cluster
388
+ into a new stored memory such as `Consolidated memory: systems...` without an
389
+ LLM call. The generated memory keeps the source memory ids in metadata, so it is
390
+ auditable instead of being a hidden summary.
391
+
359
392
  ## Optional Embeddings
360
393
 
361
394
  For sentence-transformer embeddings:
@@ -418,6 +451,10 @@ Optional pgvector environment variables:
418
451
  - `WAVEMIND_PGVECTOR_COLLECTION` - collection key, default `default`.
419
452
  - `WAVEMIND_PGVECTOR_CREATE_HNSW=1` - create an HNSW index using
420
453
  `vector_cosine_ops` when the installed pgvector version supports it.
454
+ - `WAVEMIND_PGVECTOR_HNSW_M` - optional HNSW graph degree for index creation.
455
+ - `WAVEMIND_PGVECTOR_HNSW_EF_CONSTRUCTION` - optional HNSW build accuracy setting.
456
+ - `WAVEMIND_PGVECTOR_EF_SEARCH` - optional per-query HNSW search depth. Increase
457
+ it when pgvector is fast but recall is too low.
421
458
 
422
459
  If `WAVEMIND_PGVECTOR_DSN` is missing, WaveMind raises a clear error instead of
423
460
  silently falling back to another index backend.
@@ -438,6 +475,125 @@ production latency and durability should be measured against a real Qdrant
438
475
  service. If `WAVEMIND_QDRANT_URL` is missing, WaveMind raises a clear error
439
476
  instead of silently falling back to another backend.
440
477
 
478
+ ## Scale Readiness
479
+
480
+ WaveMind now includes an explicit scale preflight:
481
+
482
+ ```sh
483
+ wavemind scale-plan --target-memories 50000
484
+ ```
485
+
486
+ For JSON output in CI or deployment checks:
487
+
488
+ ```sh
489
+ wavemind --db ./state/wavemind.sqlite3 scale-plan --target-memories 50000 --json
490
+ ```
491
+
492
+ To fail a deployment preflight when the plan needs action:
493
+
494
+ ```sh
495
+ wavemind --db ./state/wavemind.sqlite3 scale-plan --target-memories 50000 --fail-on action_required --json
496
+ ```
497
+
498
+ If you only want a plan for a future size without loading optional index
499
+ packages:
500
+
501
+ ```sh
502
+ wavemind --index faiss scale-plan --current-memories 10000 --target-memories 50000 --json
503
+ ```
504
+
505
+ The scale plan reports:
506
+
507
+ | field | meaning |
508
+ |---|---|
509
+ | `tier` | `small`, `medium`, `large-local`, `production-service`, or `million-plus`. |
510
+ | `status` | `ok`, `watch`, `action_required`, or `architecture_required`. |
511
+ | `recommended_index` | The candidate-index class to use before growth. |
512
+ | `warnings` | Why the current path may fail at the target size. |
513
+ | `actions` | Concrete setup, benchmark, rebuild, and index-health steps. |
514
+
515
+ The same scale preflight is available over HTTP:
516
+
517
+ ```sh
518
+ curl "http://127.0.0.1:8000/scale-plan?target_memories=50000"
519
+ ```
520
+
521
+ Rule of thumb:
522
+
523
+ | target memories | recommended path |
524
+ |---:|---|
525
+ | up to 1000 | SQLite + NumPy exact index. |
526
+ | 1000 to 5000 | NumPy can work, but benchmark real queries. |
527
+ | 5000 to 50000 | Persisted FAISS for local single-node, or Qdrant service. |
528
+ | 50000 to 1M | Service-backed candidate index, namespace sharding, measured p95/p99. |
529
+ | above 1M | External vector database plus WaveMind as the memory-policy layer. |
530
+
531
+ Scale readiness profile:
532
+
533
+ ```sh
534
+ python benchmarks/scale_readiness_benchmark.py --simulated-memories 1000000
535
+ ```
536
+
537
+ Checked-in result:
538
+
539
+ | profile | result |
540
+ |---|---:|
541
+ | Cluster planner | 4096 namespaces, 4 nodes, replication factor 2, single-node loss availability `1.000`. |
542
+ | Hot cache | 2000 lookups, hit rate `0.920`, p99 lookup `0.01 ms`. |
543
+ | Structured payloads | image/audio/table/event retrieval, precision@1 `1.000`, p99 `1.27 ms`. |
544
+
545
+ This profile validates routing, cache behavior, and structured payload handling.
546
+ It is not a 10M-vector load test. Real 100k, 1M, and 10M latency claims should
547
+ come from service-backed FAISS/Qdrant/pgvector load tests on production-like
548
+ hardware.
549
+
550
+ Cluster placement planning:
551
+
552
+ ```sh
553
+ wavemind cluster-plan \
554
+ --namespace-count 4096 \
555
+ --node node-a=10.0.0.1:8000 \
556
+ --node node-b=10.0.0.2:8000 \
557
+ --node node-c=10.0.0.3:8000 \
558
+ --replication-factor 2 \
559
+ --kubernetes \
560
+ --json
561
+ ```
562
+
563
+ This uses deterministic rendezvous placement so each namespace has a primary
564
+ and replica set. The emitted Kubernetes StatefulSet manifest is a deployment
565
+ starting point; it does not claim Raft consensus or automatic distributed
566
+ SQLite writes.
567
+
568
+ The same planner is available over HTTP as `POST /cluster-plan`.
569
+
570
+ ## Structured And Multimodal Memory
571
+
572
+ WaveMind can store non-text memories as structured text plus metadata. This is
573
+ useful for product events, tables, call transcripts, and image/audio captions
574
+ while keeping the same query API.
575
+
576
+ ```python
577
+ from wavemind import WaveMind, image_payload, remember_payload
578
+
579
+ memory = WaveMind()
580
+ remember_payload(
581
+ memory,
582
+ image_payload("s3://demo/chart.png", caption="enterprise revenue expansion chart"),
583
+ namespace="research",
584
+ )
585
+ print(memory.query("enterprise expansion chart", namespace="research")[0].metadata)
586
+ ```
587
+
588
+ Supported payload helpers:
589
+
590
+ | helper | use case |
591
+ |---|---|
592
+ | `image_payload()` | image URI plus caption or alt text |
593
+ | `audio_payload()` | audio URI plus transcript or summary |
594
+ | `table_payload()` | compact table preview with row count |
595
+ | `event_payload()` | structured product, user, or system event |
596
+
441
597
  ## Storage Backends
442
598
 
443
599
  SQLite is the default source of truth. For multi-tenant production deployments,
@@ -512,18 +668,23 @@ curl http://127.0.0.1:8000/audit?namespace=demo
512
668
  curl http://127.0.0.1:8000/metrics
513
669
  curl http://127.0.0.1:8000/observability
514
670
  curl http://127.0.0.1:8000/index/health
671
+ curl "http://127.0.0.1:8000/scale-plan?target_memories=50000"
515
672
  curl -X POST http://127.0.0.1:8000/index/rebuild
673
+ curl -X POST http://127.0.0.1:8000/consolidate -H "Content-Type: application/json" -d '{"namespace":"demo","seed_text":"Rust compiler systems","min_energy":0.01}'
516
674
  curl -X POST http://127.0.0.1:8000/backup -H "Content-Type: application/json" -d '{"path":"./backups","keep_last":7}'
517
675
  ```
518
676
 
519
677
  `/audit` returns mutation events such as `remember`, `forget`, `backup`, and
520
- `purge_expired`. Query audit is opt-in with `WAVEMIND_AUDIT_QUERIES=1` because
678
+ `consolidate_concept`. Query audit is opt-in with `WAVEMIND_AUDIT_QUERIES=1` because
521
679
  writing an audit row for every query changes latency. `/metrics` returns a
522
680
  Prometheus-compatible text payload without adding a required dependency.
523
681
  `/index/health` reports source-of-truth versus candidate-index consistency.
524
682
  `/index/rebuild` rebuilds the candidate index from stored active memories and
525
683
  logs an `index_rebuild` audit event.
526
684
 
685
+ Full observability guide and local Prometheus/OTEL examples:
686
+ [`docs/OBSERVABILITY.md`](docs/OBSERVABILITY.md).
687
+
527
688
  OpenTelemetry traces are optional and off by default:
528
689
 
529
690
  ```sh
@@ -642,11 +803,17 @@ Framework examples in this repository:
642
803
  | LangChain memory | `examples/langchain_memory.py` |
643
804
  | OpenAI/OpenRouter-style agent loop | `examples/agent_with_memory.py` |
644
805
  | LangGraph hooks | `wavemind.integrations.langgraph`, `examples/framework_integrations.py` |
645
- | LlamaIndex-style retriever | `wavemind.integrations.llamaindex`, `examples/framework_integrations.py` |
806
+ | LlamaIndex-style retriever | `wavemind.integrations.llamaindex`, `examples/llamaindex_retriever.py` |
646
807
  | CrewAI-style tools | `wavemind.integrations.crewai`, `examples/framework_integrations.py` |
647
808
  | AutoGen-style hooks | `wavemind.integrations.autogen`, `examples/framework_integrations.py` |
648
809
  | Namespace sharding | `examples/sharded_memory.py` |
649
810
 
811
+ Run the dedicated offline LlamaIndex-style retriever example:
812
+
813
+ ```sh
814
+ python examples/llamaindex_retriever.py
815
+ ```
816
+
650
817
  ## OpenClaw Integration
651
818
 
652
819
  [OpenClaw memory](https://docs.openclaw.ai/concepts/memory) is file-centered:
@@ -782,6 +949,18 @@ memory benchmark:
782
949
  In short: static vector search answers "what is nearest?" Dynamic memory also
783
950
  asks "what is still relevant, reinforced, scoped, and allowed to be remembered?"
784
951
 
952
+ ## Research Branches
953
+
954
+ The main branch stays focused on the core WaveMind library: dynamic memory,
955
+ storage, indexes, APIs, integrations, and public memory benchmarks.
956
+
957
+ Experimental domains live in separate branches so they can move quickly without
958
+ overloading the main README:
959
+
960
+ | Branch | Scope |
961
+ |---|---|
962
+ | [`research/crypto-pattern-memory`](https://github.com/CaspianG/wavemind/tree/research/crypto-pattern-memory) | OHLCV pattern-memory research, historical analogue retrieval, and future backtest experiments. |
963
+
785
964
  ## Benchmark
786
965
 
787
966
  WaveMind tracks benchmarks in two layers:
@@ -791,6 +970,7 @@ WaveMind tracks benchmarks in two layers:
791
970
 
792
971
  Machine-readable benchmark matrix: `benchmarks/benchmark_matrix_results.json`.
793
972
  Full generated benchmark report: [`benchmarks/BENCHMARK_REPORT.md`](benchmarks/BENCHMARK_REPORT.md).
973
+ Compact benchmark leaderboard: [`benchmarks/BENCHMARK_LEADERBOARD.md`](benchmarks/BENCHMARK_LEADERBOARD.md).
794
974
 
795
975
  Visual summary generated from the checked-in JSON results:
796
976
 
@@ -828,14 +1008,19 @@ Current read:
828
1008
  |---|---|---|
829
1009
  | Public agent-memory evidence | On official LoCoMo `locomo10.json`, WaveMind reaches `evidence_recall@5 0.386` with hash embeddings and `0.547` with sentence-transformers. Fair namespace-filtered Chroma reaches `0.257` / `0.407`; Qdrant reaches `0.263` / `0.409`. | WaveMind retrieves more labeled evidence. Chroma is still the fastest static vector-store baseline. Qdrant local payload filtering is much slower than service-mode Qdrant should be. |
830
1010
  | Public retrieval sanity check | On BEIR SciFact, WaveMind reaches `nDCG@10 0.354`, `Recall@10 0.482`; Qdrant matches that quality; Chroma reaches `0.350` / `0.467` with identical hash embeddings. | Same-embedding retrieval quality is close. Chroma is fastest at `1.79 ms`; Qdrant local is `17.71 ms`; WaveMind exact path is `117.02 ms`. |
1011
+ | Public multilingual retrieval | On NoMIRACL Russian, sampled at 200 queries / 5000 compact candidate passages, WaveMind reaches `nDCG@10 0.434`, `Recall@10 0.516`, matching Qdrant and staying within `0.002` nDCG of Chroma on identical hash embeddings. | Russian same-embedding quality is at parity. Chroma is faster at `2.60 ms`; WaveMind is `10.22 ms`; Qdrant local is `18.86 ms`. |
831
1012
  | Static agent recall | WaveMind `precision@1` equals Chroma at `0.82`; WaveMind `precision@3` is `0.90` vs Chroma `0.88`. | Competitive quality, but Chroma is faster on the static vector-store path. |
832
1013
  | Dynamic memory policy | WaveMind reaches `1.00` stale suppression; Chroma static is `0.00`. | This is the strongest current differentiation: hotness, TTL, corrections, and namespaces. |
833
- | Field memory dynamics | Graph-enabled WaveMind reaches `1.00` `precision@1`, `1.00` stale suppression, and `1.00` concept formation vs static WaveMind at `0.20` / `0.20` / `0.00`. | This is still synthetic, but it is the first regression check for memory-to-memory excitation, conflict inhibition, and decay. |
1014
+ | Field memory dynamics | Graph-enabled WaveMind reaches `1.00` `precision@1`, `1.00` stale suppression, `1.00` concept formation, and `1.00` durable concept consolidation vs static WaveMind at `0.20` / `0.20` / `0.00` / `0.00`. | This is still synthetic, but it is now a regression check for memory-to-memory excitation, conflict inhibition, decay, and self-organization into auditable concept memories. |
834
1015
  | Long-term evidence | WaveMind reaches `1.00` evidence recall@5, `1.00` precision@1, and `1.00` stale suppression on the synthetic long-memory evidence benchmark. | This is the first proof-shaped benchmark for agent memory: it measures whether stale/corrected/expired/cross-user facts stay out of retrieved evidence. |
835
1016
  | Capacity | Static `precision@1` is `0.94` at 5000 memories; dynamic policy keeps `1.00` on the current checks. | Quality is holding on these checks, but dynamic latency must be optimized. |
836
1017
  | LongMemEval full retrieval | On the official LongMemEval-S cleaned file, 470 non-abstention session-level questions, WaveMind reaches `evidence_recall@5 0.782` and `precision@1 0.696`; Chroma static reaches `0.518` / `0.355`; Qdrant static reaches `0.520` / `0.355`. | This is now the strongest public memory result in the repo. It is retrieval-only, not final answer quality. |
1018
+ | LongMemEval 50-query smoke | On the first 50 non-abstention LongMemEval-S questions, WaveMind reaches `evidence_recall@5 0.920`, `precision@1 0.760`, and `MRR@5 0.827`; Chroma/Qdrant static reach `0.600`, `0.260`, and `0.385`. | This is the fast regression profile for checking current changes before rerunning the full LongMemEval profile. WaveMind wins on quality; latency still needs work. |
837
1019
  | ANN/index curve | At 50000 generated 128-d vectors, NumPy exact keeps `recall@10 1.000` at `6.49 ms`; quantized int8 keeps `0.934` at `24.92 ms`; Annoy is faster at `4.92 ms` but drops to `0.730` recall; Qdrant local keeps `1.000` recall at `43.49 ms`. | Current local scale boundary is clear: quantized search needs kernel work, Annoy needs tuning/FAISS, and Qdrant should be tested in service mode for a fair production comparison. |
838
- | Next public proof | LongMemEval / LoCoMo answer generation with a local LLM. | Retrieval is now measured. The next serious number should test answer accuracy, abstention, and faithfulness. |
1020
+ | Production load | At 100000 generated 128-d vectors, service-mode Qdrant reaches `recall@10 1.000`, avg `10.76 ms`; pgvector HNSW reaches `0.736`, avg `17.76 ms`; at 1M vectors Qdrant reaches `0.506`, avg `45.81 ms`. | Qdrant service is already usable at 100k. The 1M result is not production-grade yet: large-N service settings need tuning before claiming million-memory recall. |
1021
+ | Scale readiness | Deterministic 1M-memory simulation validates 4096 namespace placements over 4 nodes with replication factor 2, single-node-loss availability `1.000`, hot-cache hit rate `0.920`, and structured payload precision@1 `1.000`. | This proves routing/cache/payload foundations, not a 10M-vector load-test claim. Real 100k-10M production latency needs service-backed load tests. |
1022
+ | Memory competitor adapters | WaveMind reaches `precision@1 0.80`, `precision@3 1.00`, stale suppression `1.00` on the small adapter profile. Mem0, Zep, and LangGraph are listed as skipped unless their real packages/services are configured. | This prevents fake competitor claims. The adapter harness is ready; real Mem0/Zep/LangGraph results still need configured installs. |
1023
+ | LongMemEval local answer generation | With the same local Ollama `qwen2.5:1.5b`, WaveMind reaches `exact_match 0.240`, `contains_answer 0.380`, `token_f1 0.333`, and `evidence_recall@5 0.920`; Chroma and Qdrant static both reach `0.120`, `0.160`, `0.170`, and `0.600`. | This is the first checked-in end-to-end answer benchmark against Chroma/Qdrant. It is still a 50-question lightweight smoke run, not a full LongMemEval leaderboard score. |
839
1024
 
840
1025
  ### Real Benchmark Matrix
841
1026
 
@@ -843,17 +1028,22 @@ Current read:
843
1028
  |---|---|---|---|---|
844
1029
  | Agent user-memory retrieval | Natural-language recall over 200 user facts. | implemented | Chroma | Match Chroma `precision@1`, beat `precision@3`, stay under 5 ms at 200 memories. |
845
1030
  | Dynamic memory policy | Hot memory, TTL, corrections, stale suppression, namespace isolation. | implemented | Chroma static | Keep `precision@1` and stale suppression at 1.00, cut avg latency below 10 ms at 1000 memories. |
846
- | Field memory graph dynamics | Related memories excite each other, newer conflicting memories suppress stale facts, graph energy decays, and active clusters expose concept candidates. | implemented | WaveMind static | Keep `precision@1`, stale suppression, and concept formation at 1.00 while moving from synthetic checks to LoCoMo/LongMemEval evidence. |
1031
+ | Field memory graph dynamics | Related memories excite each other, newer conflicting memories suppress stale facts, graph energy decays, and active clusters can become durable concept memories. | implemented | WaveMind static | Keep `precision@1`, stale suppression, concept formation, and concept consolidation at 1.00 while moving from synthetic checks to LoCoMo/LongMemEval evidence. |
847
1032
  | WaveMind capacity curve | How recall and latency change at 200 / 1000 / 5000 memories. | implemented | WaveMind-only today | Keep `precision@1 >= 0.95` at 5000 memories and dynamic latency below 20 ms. |
848
1033
  | Long-term memory evidence | Evidence retrieval from long histories with profile, preference, correction, TTL, namespace, and filler noise. | implemented | Static vector / Chroma / Qdrant | Keep this as a small regression test while public LoCoMo and LongMemEval runners carry the stronger evidence claims. |
849
1034
  | BEIR-style open retrieval runner | Public `corpus.jsonl`, `queries.jsonl`, `qrels/*.tsv` datasets with the same metrics for each engine. | implemented | WaveMind / Chroma / Qdrant | Use identical embeddings and report `nDCG@k`, `Recall@k`, `MRR@k`, `precision@1`, and latency. Current checked-in run: BEIR SciFact. |
1035
+ | NoMIRACL Russian retrieval | Russian human-annotated multilingual relevance over compact candidate passages. | implemented | WaveMind / Chroma / Qdrant | Keep same-embedding `nDCG@10` at parity, then rerun with sentence-transformers and full MIRACL Russian when disk/service capacity allows it. |
850
1036
  | ANN/VectorDBBench-style local curve | Recall/latency tradeoff for candidate indexes on generated vectors. | implemented | NumPy exact / quantized int8 / Annoy / Qdrant local | Use this as the local engineering curve; official VectorDBBench remains future work. |
1037
+ | Production index profile | Docker-backed 50000-vector profile for persisted FAISS, Qdrant service, and PostgreSQL/pgvector HNSW. | implemented | FAISS / Qdrant service / pgvector | Keep service-mode candidate generation above `0.95` recall@10 and below 10 ms average query latency at 50000 vectors. |
1038
+ | Production load profile | 100k and 1M service-backed candidate-index checks. | implemented | Qdrant service / pgvector HNSW / FAISS persisted | 100k Qdrant is strong; 1M Qdrant and pgvector require tuning before production claims. |
1039
+ | Scale readiness profile | Cluster placement, single-node-loss simulation, hot-cache behavior, and structured/multimodal payload retrieval. | implemented | Mem0 / Zep / LangGraph persistent memory / GraphRAG target adapters | Use this as production foundation proof before real distributed 100k, 1M, and 10M load tests. |
1040
+ | Memory competitor adapter profile | Dynamic-memory scenario wired for external memory frameworks. | implemented | Mem0 / Zep / LangGraph persistent memory | Report real competitor results only when their packages/services are explicitly configured. |
851
1041
  | [BEIR](https://github.com/beir-cellar/beir) | Standard zero-shot information retrieval quality. | planned | Chroma / Qdrant / FAISS | Stay within 0.02 `nDCG@10` on identical embeddings. |
852
1042
  | [MTEB Retrieval](https://github.com/embeddings-benchmark/mteb) | Separates encoder quality from retrieval-store quality. | planned | Chroma / Qdrant / FAISS | Prove WaveMind does not reduce same-embedding retrieval quality. |
853
- | [MIRACL Russian](https://miracl.ai/) | Multilingual retrieval with Russian relevance judgments. | planned | Chroma / Qdrant / FAISS | Reach same-embedding parity on Russian `nDCG@10`. |
1043
+ | [MIRACL Russian](https://miracl.ai/) | Multilingual retrieval with Russian relevance judgments. | runner ready | Chroma / Qdrant / FAISS | NoMIRACL Russian compact run is implemented; full-corpus MIRACL Russian remains the next heavier profile. |
854
1044
  | [VectorDBBench](https://github.com/zilliztech/VectorDBBench) | Vector database insertion/search/filter/cost-performance benchmark. | planned | Chroma / Qdrant / Milvus / Weaviate / Pinecone / FAISS | Use only after WaveMind has a production index path; today it is a memory layer, not a standalone cloud vector DB. |
855
1045
  | [LoCoMo](https://arxiv.org/abs/2402.17753) | Long conversation memory, temporal consistency, multi-hop recall. Retrieval-only runner is implemented for official `locomo10.json`. | implemented | Static vector / Chroma / Qdrant | Improve answer generation accuracy on top of the stronger sentence-transformers evidence retrieval run. |
856
- | [LongMemEval](https://arxiv.org/abs/2410.10813) | Long-term assistant memory with updates and abstention. | implemented retrieval, answer runner ready | Static vector / Chroma / Qdrant / Mem0-style memory | Add LLM answer quality and abstention after retrieval. |
1046
+ | [LongMemEval](https://arxiv.org/abs/2410.10813) | Long-term assistant memory with updates and abstention. | implemented retrieval + local Ollama answer smoke | Static vector / Chroma / Qdrant / Mem0-style memory | Add stronger LLM answer quality, abstention, and Chroma/Qdrant RAG answer baselines. |
857
1047
  | [LongMemEval-V2](https://arxiv.org/abs/2605.12493) | Web-agent memory: state recall, dynamic state, workflow gotchas. | planned | AgentRunbook-R / Chroma RAG / Qdrant RAG | Prove WaveMind can retrieve compact evidence from agent trajectories. |
858
1048
  | [LMEB](https://github.com/KaLM-Embedding/LMEB) | Long-horizon memory embedding tasks beyond normal passage retrieval. | planned | Embedding-only baselines / Chroma / Qdrant | Choose the default semantic encoder using memory-specific tasks. |
859
1049
  | [RAGBench](https://huggingface.co/datasets/rungalileo/ragbench) | Downstream RAG context and answer quality. | planned | Chroma RAG / Qdrant RAG / Pinecone RAG | Show whether stale-memory suppression improves context relevance. |
@@ -899,6 +1089,36 @@ Qdrant local preserves the same ranking quality and is much faster than the
899
1089
  WaveMind NumPy exact path. The engineering target is a FAISS/Annoy candidate
900
1090
  index with WaveMind's dynamic field policy applied only as a top-k re-ranker.
901
1091
 
1092
+ ### NoMIRACL Russian Retrieval
1093
+
1094
+ WaveMind includes a compact multilingual retrieval runner for
1095
+ [NoMIRACL](https://huggingface.co/datasets/miracl/nomiracl), the negative-aware
1096
+ MIRACL relevance dataset. The checked-in run uses Russian `test.relevant`
1097
+ queries and the compact Russian candidate corpus. It is not a full-corpus
1098
+ MIRACL run; it is a reproducible multilingual relevance benchmark small enough
1099
+ to run on a local machine.
1100
+
1101
+ ```sh
1102
+ python benchmarks/nomiracl_russian_benchmark.py --download --dataset benchmarks/data/nomiracl-russian --engines wavemind chroma qdrant --top-k 10 --limit-queries 200 --limit-corpus 5000 --output benchmarks/nomiracl_russian_results.json
1103
+ ```
1104
+
1105
+ Checked-in NoMIRACL Russian result:
1106
+
1107
+ 200 Russian queries, 5000 compact candidate passages,
1108
+ `HashingTextEncoder`, top-k 10. Full machine-readable result:
1109
+ `benchmarks/nomiracl_russian_results.json`.
1110
+
1111
+ | engine | nDCG@10 | Recall@10 | MRR@10 | precision@1 | avg latency | p95 latency |
1112
+ |---|---:|---:|---:|---:|---:|---:|
1113
+ | WaveMind | 0.434 | 0.516 | 0.489 | 0.410 | 10.22 ms | 15.53 ms |
1114
+ | Chroma | 0.435 | 0.519 | 0.490 | 0.410 | 2.60 ms | 3.44 ms |
1115
+ | Qdrant | 0.434 | 0.516 | 0.489 | 0.410 | 18.86 ms | 24.08 ms |
1116
+
1117
+ Read this as multilingual same-embedding parity, not as a claim that the hash
1118
+ encoder is the best Russian semantic model. The next stronger run should use
1119
+ `sentence-transformers` on the same NoMIRACL split, then full MIRACL Russian
1120
+ when there is enough disk/service capacity.
1121
+
902
1122
  ### LoCoMo Evidence Retrieval
903
1123
 
904
1124
  WaveMind now includes a retrieval-only runner for the public
@@ -1011,18 +1231,35 @@ result: `benchmarks/longmemeval_evidence_results.json`.
1011
1231
  The Chroma and Qdrant baselines now use the same namespace/payload scope as
1012
1232
  WaveMind. Qdrant is run in local embedded mode; the Qdrant client warns that
1013
1233
  local mode is not recommended above 20000 points, so this latency should not be
1014
- read as a service-mode Qdrant result. The next step is answer-quality evaluation
1015
- with a local LLM.
1234
+ read as a service-mode Qdrant result.
1016
1235
 
1017
- Answer-generation runner:
1236
+ Answer-generation runner with local Ollama:
1018
1237
 
1019
1238
  ```sh
1020
- python benchmarks/longmemeval_answer_benchmark.py --dataset benchmarks/data/longmemeval_s_cleaned.json --provider ollama --model YOUR_LOCAL_MODEL --top-k 5 --output benchmarks/longmemeval_answer_results.json
1239
+ python benchmarks/longmemeval_answer_benchmark.py --dataset benchmarks/data/longmemeval_s_cleaned.json --provider ollama --model YOUR_LOCAL_MODEL --engines wavemind chroma qdrant --top-k 5 --output benchmarks/longmemeval_answer_results.json
1021
1240
  ```
1022
1241
 
1242
+ Checked-in local answer-generation smoke runs:
1243
+
1244
+ 50 non-abstention LongMemEval-S questions, compact retrieved evidence,
1245
+ same `HashingTextEncoder`, same local Ollama model, top-k 5. Full machine-readable results:
1246
+ `benchmarks/longmemeval_answer_qwen25_0_5b_50_results.json` and
1247
+ `benchmarks/longmemeval_answer_qwen25_1_5b_50_results.json`.
1248
+
1249
+ | system | questions | evidence recall@5 | exact match | contains answer | token F1 | avg retrieval | avg generation |
1250
+ |---|---:|---:|---:|---:|---:|---:|---:|
1251
+ | WaveMind + Ollama `qwen2.5:0.5b` | 50 | 0.920 | 0.120 | 0.180 | 0.183 | 2.98 ms | 1428.20 ms |
1252
+ | Chroma static + Ollama `qwen2.5:0.5b` | 50 | 0.600 | 0.100 | 0.120 | 0.126 | 4.10 ms | 1234.69 ms |
1253
+ | Qdrant static + Ollama `qwen2.5:0.5b` | 50 | 0.600 | 0.100 | 0.120 | 0.126 | 63.80 ms | 893.48 ms |
1254
+ | WaveMind + Ollama `qwen2.5:1.5b` | 50 | 0.920 | 0.240 | 0.380 | 0.333 | 2.00 ms | 2153.00 ms |
1255
+ | Chroma static + Ollama `qwen2.5:1.5b` | 50 | 0.600 | 0.120 | 0.160 | 0.170 | 7.05 ms | 2082.38 ms |
1256
+ | Qdrant static + Ollama `qwen2.5:1.5b` | 50 | 0.600 | 0.120 | 0.160 | 0.170 | 100.20 ms | 758.11 ms |
1257
+
1023
1258
  There is also an extractive smoke run that does not require a model:
1024
1259
  `benchmarks/longmemeval_answer_extractive_20_results.json`. It is only a runner
1025
- check, not a meaningful final answer-quality benchmark.
1260
+ check, not a meaningful final answer-quality benchmark. The Ollama runs are real
1261
+ local LLM runs, but still lightweight smoke results rather than official
1262
+ LongMemEval leaderboard scores.
1026
1263
 
1027
1264
  ### ANN Index Curve
1028
1265
 
@@ -1040,13 +1277,12 @@ Add `qdrant-service` when `WAVEMIND_QDRANT_URL` points at a running Qdrant
1040
1277
  service. Add `faiss-persisted` when `WAVEMIND_FAISS_PATH` points at the FAISS
1041
1278
  snapshot file to validate persisted-index startup behavior.
1042
1279
 
1043
- Production profile example:
1280
+ Reproducible Docker production profile:
1044
1281
 
1045
1282
  ```sh
1046
- export WAVEMIND_FAISS_PATH="./state/ann-curve.faiss"
1047
- export WAVEMIND_QDRANT_URL="http://localhost:6333"
1048
- export WAVEMIND_PGVECTOR_DSN="postgresql://user:password@localhost:5432/wavemind"
1049
- python benchmarks/ann_index_curve_benchmark.py --sizes 10000 50000 --dim 128 --queries 100 --top-k 10 --engines faiss-persisted qdrant-service pgvector --output benchmarks/production_index_profile_results.json
1283
+ docker compose -f examples/production-index-profile/docker-compose.yml up -d qdrant postgres
1284
+ docker compose -f examples/production-index-profile/docker-compose.yml run --rm benchmark
1285
+ docker compose -f examples/production-index-profile/docker-compose.yml down
1050
1286
  ```
1051
1287
 
1052
1288
  Checked-in 50000-vector point:
@@ -1059,15 +1295,62 @@ Checked-in 50000-vector point:
1059
1295
  | WaveMind faiss | skipped | - | - | - |
1060
1296
  | Qdrant local | 1.000 | 43.49 ms | 59.68 ms | 17525.7 ms |
1061
1297
 
1298
+ Checked-in production 50000-vector point:
1299
+
1300
+ | engine | recall@10 | avg latency | p95 latency | build |
1301
+ |---|---:|---:|---:|---:|
1302
+ | WaveMind faiss-persisted | 1.000 | 3.52 ms | 7.88 ms | 715.9 ms |
1303
+ | Qdrant service | 1.000 | 4.41 ms | 5.93 ms | 12269.8 ms |
1304
+ | WaveMind pgvector | 0.811 | 10.95 ms | 15.69 ms | 185048.9 ms |
1305
+
1306
+ Checked-in production load points:
1307
+
1308
+ ```sh
1309
+ python benchmarks/production_load_benchmark.py --sizes 100000 --dim 128 --queries 100 --top-k 10 --engines qdrant-service pgvector faiss-persisted
1310
+ python benchmarks/production_load_benchmark.py --sizes 1000000 --dim 128 --queries 50 --top-k 10 --engines qdrant-service --output benchmarks/production_load_qdrant_1m_results.json
1311
+ ```
1312
+
1313
+ | vectors | engine | recall@10 | avg latency | p95 latency | build |
1314
+ |---:|---|---:|---:|---:|---:|
1315
+ | 100000 | Qdrant service | 1.000 | 10.76 ms | 18.78 ms | 39873.2 ms |
1316
+ | 100000 | WaveMind pgvector | 0.736 | 17.76 ms | 23.48 ms | 455703.7 ms |
1317
+ | 100000 | WaveMind faiss-persisted | skipped | - | - | - |
1318
+ | 1000000 | Qdrant service | 0.506 | 45.81 ms | 65.18 ms | 563945.5 ms |
1319
+
1062
1320
  Read this as an engineering curve, not an official VectorDBBench result. Annoy
1063
1321
  is faster than exact NumPy at 50000 vectors but loses too much recall with the
1064
1322
  current settings. The new `quantized` backend compresses vectors and keeps
1065
1323
  `0.934` recall@10 on this run, but the current Python/NumPy kernel is slower
1066
1324
  than exact NumPy; it is a memory-footprint baseline, not a latency win yet.
1067
- FAISS persistence, service-mode Qdrant, and pgvector are now explicit benchmark
1068
- profiles. If a required package, service, or environment variable is missing,
1069
- the runner marks that engine as `skipped` instead of silently falling back to
1070
- another backend.
1325
+ FAISS persistence and service-mode Qdrant now both preserve exact recall at
1326
+ 50000 generated vectors. The checked-in pgvector/HNSW profile uses
1327
+ `WAVEMIND_PGVECTOR_EF_SEARCH=400`, which improves recall materially but still
1328
+ misses the `0.95` production target and is slower than the other two profiles.
1329
+ The 100k load profile shows Qdrant service is already viable for candidate
1330
+ generation; the 1M Qdrant profile shows that default service settings are not
1331
+ enough for production recall and need HNSW/search tuning before million-memory
1332
+ claims.
1333
+ If a required package, service, or environment variable is missing, the runner
1334
+ marks that engine as `skipped` instead of silently falling back to another
1335
+ backend.
1336
+
1337
+ ### Memory Competitor Adapter Profile
1338
+
1339
+ WaveMind includes a small dynamic-memory adapter profile for Mem0, Zep, and
1340
+ LangGraph persistent memory. It checks corrections, TTL, namespace isolation,
1341
+ and preference recall. Missing competitors are marked `skipped` with setup
1342
+ reasons instead of being approximated.
1343
+
1344
+ ```sh
1345
+ python benchmarks/memory_competitor_benchmark.py --engines wavemind mem0 zep langgraph
1346
+ ```
1347
+
1348
+ | engine | precision@1 | precision@3 | stale suppression | avg latency |
1349
+ |---|---:|---:|---:|---:|
1350
+ | WaveMind | 0.80 | 1.00 | 1.00 | 0.55 ms |
1351
+ | Mem0 | skipped | - | - | - |
1352
+ | Zep | skipped | - | - | - |
1353
+ | LangGraph persistent memory | skipped | - | - | - |
1071
1354
 
1072
1355
  ### Current Local Runs
1073
1356
 
@@ -1076,13 +1359,13 @@ Field memory dynamics benchmark:
1076
1359
  13 memories, 5 conflicting-fact queries, deterministic local encoder.
1077
1360
  This benchmark isolates the `MemoryFieldGraph`: related memories can spread
1078
1361
  activation, newer conflicting memories inhibit stale facts, graph energy decays,
1079
- and active clusters can surface concept candidates.
1362
+ and active clusters can surface and persist concept memories.
1080
1363
  Full machine-readable result: `benchmarks/field_memory_dynamics_results.json`.
1081
1364
 
1082
- | engine | precision@1 | precision@3 | stale suppression | concept formation | decay ratio | avg latency |
1083
- |---|---:|---:|---:|---:|---:|---:|
1084
- | WaveMind graph | 1.00 | 1.00 | 1.00 | 1.00 | 0.81 | 0.82 ms |
1085
- | WaveMind static | 0.20 | 1.00 | 0.20 | 0.00 | 0.00 | 0.43 ms |
1365
+ | engine | precision@1 | precision@3 | stale suppression | concept formation | concept consolidation | decay ratio | avg latency |
1366
+ |---|---:|---:|---:|---:|---:|---:|---:|
1367
+ | WaveMind graph | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.81 | 1.81 ms |
1368
+ | WaveMind static | 0.20 | 1.00 | 0.20 | 0.00 | 0.00 | 0.00 | 0.48 ms |
1086
1369
 
1087
1370
  Run locally from a cloned repository:
1088
1371
 
@@ -1223,6 +1506,7 @@ If you already use Chroma for local memory, see the practical migration guide:
1223
1506
  - Optimal capacity on the current NumPy exact index is up to 1000 records.
1224
1507
  - At 5000 records, one-word `precision@1` is currently 0.72 with the hash encoder; many misses are ambiguous queries where another sentence containing the same word ranks first.
1225
1508
  - For `N > 5000`, the NumPy exact index is still reliable but scales linearly. Annoy is faster at 50000 vectors in the local curve, but current recall is only `0.730`; the `quantized` backend reaches `0.934` recall@10 but is slower than NumPy on the current kernel. Use FAISS or a production vector service before claiming large-scale ANN quality.
1509
+ - Run `wavemind scale-plan --target-memories <N>` before growing a deployment. It is a guardrail, not a benchmark replacement: it tells you when NumPy is no longer the right candidate index and which checks to run next.
1226
1510
  - `sentence-transformers/paraphrase-multilingual-mpnet-base-v2` requires about 420 MB of model files. Benchmark runners cache embeddings so retrieval latency is measured separately from model encoding latency.
1227
1511
  - The Chroma comparison currently uses shared precomputed hash embeddings to isolate retrieval/ranking behavior; semantic model comparisons should be run separately.
1228
1512
  - The BEIR SciFact run uses the hash encoder to isolate index/retrieval behavior. It is not a semantic embedding leaderboard result.
@@ -1243,10 +1527,10 @@ If you already use Chroma for local memory, see the practical migration guide:
1243
1527
  - The `quantized` backend is an explicit int8 candidate-index experiment. It
1244
1528
  reduces vector precision and must be benchmarked per workload before use.
1245
1529
  - The synthetic long-term memory evidence benchmark is useful for regression and product-shape proof, but public claims should lean on LoCoMo and LongMemEval instead.
1246
- - The LongMemEval result is retrieval-only. It is not a full LongMemEval answer-generation leaderboard-equivalent score.
1530
+ - The main LongMemEval evidence result is retrieval-only. The checked-in Ollama answer-generation comparison now includes WaveMind, Chroma static, and Qdrant static over 50 questions, but it is still not a full LongMemEval leaderboard-equivalent score.
1247
1531
  - Qdrant baselines in this README use embedded local mode. Qdrant itself warns that local mode is not recommended above 20000 points; use the `qdrant-service` benchmark profile before making production latency claims.
1248
1532
  - MTEB, MIRACL, LMEB, official VectorDBBench, and RAGBench are listed as the public benchmark roadmap, not as completed results yet.
1249
- - Ollama answer generation is implemented, but the current machine has no local Ollama model available and the local Ollama API returns 502/connection-reset. The checked-in answer file is extractive smoke only, not an LLM score.
1533
+ - Local Ollama answer generation now works with `qwen2.5:0.5b` and `qwen2.5:1.5b`; WaveMind leads the checked-in Chroma/Qdrant smoke comparison, but answer quality is still limited by small-model reasoning and should be rerun with stronger local/API models before making product claims.
1250
1534
  - Public benchmark adapters require optional datasets, heavier dependencies, or running services. They are intentionally outside the minimal `pip install wavemind` path.
1251
1535
  - Dynamic memory is slower than static Chroma in the current local benchmark: 25.26 ms vs 1.75 ms average query latency on this machine.
1252
1536
  - Current WaveMind-only dynamic checks keep `precision@1` at 1.00 through 5000 memories, but average latency is around 48-54 ms. The next optimization target is field/re-ranking latency, not basic recall quality.