wavemind 2.2.1__tar.gz → 2.2.2__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- wavemind-2.2.2/Dockerfile +28 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/MANIFEST.in +5 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/PKG-INFO +316 -32
- {wavemind-2.2.1 → wavemind-2.2.2}/README.md +315 -31
- wavemind-2.2.2/benchmarks/BENCHMARK_LEADERBOARD.md +32 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/benchmarks/BENCHMARK_REPORT.md +11 -4
- {wavemind-2.2.1 → wavemind-2.2.2}/benchmarks/ann_index_curve_benchmark.py +8 -6
- {wavemind-2.2.1 → wavemind-2.2.2}/benchmarks/benchmark_matrix_results.json +375 -13
- {wavemind-2.2.1 → wavemind-2.2.2}/benchmarks/benchmark_registry.py +300 -28
- {wavemind-2.2.1 → wavemind-2.2.2}/benchmarks/field_memory_dynamics_benchmark.py +28 -1
- {wavemind-2.2.1 → wavemind-2.2.2}/benchmarks/field_memory_dynamics_results.json +5 -3
- wavemind-2.2.2/benchmarks/longmemeval_answer_benchmark.py +599 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/benchmarks/longmemeval_answer_extractive_20_results.json +91 -4
- wavemind-2.2.2/benchmarks/longmemeval_answer_qwen25_0_5b_50_results.json +344 -0
- wavemind-2.2.2/benchmarks/longmemeval_answer_qwen25_1_5b_50_results.json +344 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/benchmarks/longmemeval_evidence_50_results.json +10 -10
- {wavemind-2.2.1 → wavemind-2.2.2}/benchmarks/longmemeval_memory_benchmark.py +2 -1
- wavemind-2.2.2/benchmarks/memory_competitor_benchmark.py +244 -0
- wavemind-2.2.2/benchmarks/memory_competitor_results.json +41 -0
- wavemind-2.2.2/benchmarks/nomiracl_russian_benchmark.py +242 -0
- wavemind-2.2.2/benchmarks/nomiracl_russian_results.json +53 -0
- wavemind-2.2.2/benchmarks/production_index_profile_results.json +83 -0
- wavemind-2.2.2/benchmarks/production_load_benchmark.py +164 -0
- wavemind-2.2.2/benchmarks/production_load_qdrant_1m_results.json +66 -0
- wavemind-2.2.2/benchmarks/production_load_results.json +79 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/benchmarks/render_benchmark_charts.py +25 -18
- wavemind-2.2.2/benchmarks/render_benchmark_leaderboard.py +234 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/benchmarks/render_benchmark_report.py +4 -0
- wavemind-2.2.2/benchmarks/scale_readiness_benchmark.py +266 -0
- wavemind-2.2.2/benchmarks/scale_readiness_results.json +49 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/benchmarks/wavemind_capacity_results.json +1 -1
- {wavemind-2.2.1 → wavemind-2.2.2}/docker-compose.yml +1 -1
- wavemind-2.2.2/docs/BENCHMARK_BRIEF.md +248 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/docs/CHROMA_MIGRATION.md +16 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/docs/LAUNCH_KIT.md +21 -6
- wavemind-2.2.2/docs/OBSERVABILITY.md +197 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/docs/ROADMAP.md +51 -11
- {wavemind-2.2.1 → wavemind-2.2.2}/docs/RU_LAUNCH_POSTS.md +26 -43
- {wavemind-2.2.1 → wavemind-2.2.2}/docs/USE_CASES.md +25 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/docs/assets/benchmark-summary.svg +60 -48
- wavemind-2.2.2/docs/assets/wavemind-demo.gif +0 -0
- wavemind-2.2.2/examples/chroma_migration.py +172 -0
- wavemind-2.2.2/examples/customer_support_memory.py +214 -0
- wavemind-2.2.2/examples/llamaindex_retriever.py +88 -0
- wavemind-2.2.2/examples/observability/README.md +18 -0
- wavemind-2.2.2/examples/observability/docker-compose.yml +44 -0
- wavemind-2.2.2/examples/observability/otel-collector.yaml +22 -0
- wavemind-2.2.2/examples/observability/prometheus-alerts.yml +39 -0
- wavemind-2.2.2/examples/observability/prometheus.yml +16 -0
- wavemind-2.2.2/examples/production-index-profile/README.md +39 -0
- wavemind-2.2.2/examples/production-index-profile/docker-compose.yml +64 -0
- wavemind-2.2.2/examples/research_notebook_memory.py +220 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/pyproject.toml +1 -1
- {wavemind-2.2.1 → wavemind-2.2.2}/requirements-optional.txt +4 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/tests/test_api.py +133 -0
- wavemind-2.2.2/tests/test_benchmark_brief.py +35 -0
- wavemind-2.2.2/tests/test_benchmark_leaderboard.py +39 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/tests/test_benchmark_registry.py +8 -1
- {wavemind-2.2.1 → wavemind-2.2.2}/tests/test_benchmark_report.py +2 -0
- wavemind-2.2.2/tests/test_chroma_migration_example.py +92 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/tests/test_cli_smoke.py +54 -0
- wavemind-2.2.2/tests/test_cluster.py +100 -0
- wavemind-2.2.2/tests/test_examples.py +259 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/tests/test_field_graph_integration.py +42 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/tests/test_field_memory_dynamics_benchmark.py +2 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/tests/test_indexes_encoders.py +25 -0
- wavemind-2.2.2/tests/test_jobs.py +56 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/tests/test_longmemeval_answer_benchmark.py +53 -1
- wavemind-2.2.2/tests/test_memory_competitor_benchmark.py +50 -0
- wavemind-2.2.2/tests/test_multimodal.py +71 -0
- wavemind-2.2.2/tests/test_nomiracl_russian_benchmark.py +73 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/tests/test_observability.py +42 -0
- wavemind-2.2.2/tests/test_observability_docs.py +30 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/tests/test_packaging_files.py +42 -1
- wavemind-2.2.2/tests/test_production_index_profile.py +47 -0
- wavemind-2.2.2/tests/test_production_load_benchmark.py +73 -0
- wavemind-2.2.2/tests/test_scale_plan.py +171 -0
- wavemind-2.2.2/tests/test_scale_readiness_benchmark.py +18 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/wavemind/__init__.py +28 -1
- {wavemind-2.2.1 → wavemind-2.2.2}/wavemind/api.py +222 -37
- {wavemind-2.2.1 → wavemind-2.2.2}/wavemind/cli.py +148 -0
- wavemind-2.2.2/wavemind/cluster.py +215 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/wavemind/core.py +184 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/wavemind/indexes.py +67 -24
- wavemind-2.2.2/wavemind/jobs.py +216 -0
- wavemind-2.2.2/wavemind/multimodal.py +147 -0
- wavemind-2.2.2/wavemind/scale.py +152 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/wavemind.egg-info/PKG-INFO +316 -32
- {wavemind-2.2.1 → wavemind-2.2.2}/wavemind.egg-info/SOURCES.txt +45 -0
- wavemind-2.2.1/Dockerfile +0 -23
- wavemind-2.2.1/benchmarks/longmemeval_answer_benchmark.py +0 -351
- wavemind-2.2.1/tests/test_examples.py +0 -132
- {wavemind-2.2.1 → wavemind-2.2.2}/CONTRIBUTING.md +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/LICENSE +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/SECURITY.md +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/SUPPORT.md +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/benchmarks/agent_memory_benchmark.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/benchmarks/agent_memory_results.json +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/benchmarks/ann_index_curve_results.json +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/benchmarks/dynamic_memory_benchmark.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/benchmarks/dynamic_memory_results.json +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/benchmarks/locomo_evidence_results.json +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/benchmarks/locomo_memory_benchmark.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/benchmarks/locomo_sentence_evidence_results.json +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/benchmarks/long_memory_evidence_benchmark.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/benchmarks/long_memory_evidence_results.json +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/benchmarks/longmemeval_evidence_results.json +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/benchmarks/open_retrieval_benchmark.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/benchmarks/open_retrieval_scifact_results.json +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/benchmarks/ru_sentences_benchmark.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/docs/DEMO_SCRIPT.md +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/docs/PROJECT_BOARD.md +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/docs/RELEASE.md +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/docs/assets/wavemind-social-card.svg +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/examples/agent_with_memory.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/examples/demo.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/examples/dynamic_memory_demo.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/examples/framework_integrations.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/examples/langchain_memory.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/examples/sharded_memory.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/install.bat +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/install.sh +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/requirements.txt +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/setup.cfg +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/tests/test_agent_memory_benchmark.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/tests/test_ann_index_curve_benchmark.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/tests/test_api_process_persistence.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/tests/test_benchmark_charts.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/tests/test_core_persistence.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/tests/test_dynamic_memory_benchmark.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/tests/test_field_graph.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/tests/test_framework_adapters.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/tests/test_import_benchmark.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/tests/test_langchain_integration.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/tests/test_locomo_memory_benchmark.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/tests/test_long_memory_evidence_benchmark.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/tests/test_longmemeval_memory_benchmark.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/tests/test_open_retrieval_benchmark.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/tests/test_postgres_storage.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/tests/test_semantic_and_latency.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/tests/test_sharding.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/wavemind/__main__.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/wavemind/benchmark.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/wavemind/encoders.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/wavemind/field_graph.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/wavemind/importers.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/wavemind/integrations/__init__.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/wavemind/integrations/autogen.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/wavemind/integrations/crewai.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/wavemind/integrations/langchain.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/wavemind/integrations/langgraph.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/wavemind/integrations/llamaindex.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/wavemind/observability.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/wavemind/sharding.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/wavemind/storage.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/wavemind/studio.py +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/wavemind.egg-info/dependency_links.txt +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/wavemind.egg-info/entry_points.txt +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/wavemind.egg-info/requires.txt +0 -0
- {wavemind-2.2.1 → wavemind-2.2.2}/wavemind.egg-info/top_level.txt +0 -0
|
@@ -0,0 +1,28 @@
|
|
|
1
|
+
FROM python:3.11-slim
|
|
2
|
+
|
|
3
|
+
ARG INSTALL_OPTIONAL=false
|
|
4
|
+
ARG INSTALL_OTEL=false
|
|
5
|
+
ARG INSTALL_PRODUCTION=false
|
|
6
|
+
|
|
7
|
+
ENV PYTHONDONTWRITEBYTECODE=1
|
|
8
|
+
ENV PYTHONUNBUFFERED=1
|
|
9
|
+
ENV WAVEMIND_DB=/data/wavemind.sqlite3
|
|
10
|
+
ENV WAVEMIND_LOG_LEVEL=INFO
|
|
11
|
+
|
|
12
|
+
WORKDIR /app
|
|
13
|
+
|
|
14
|
+
RUN if [ "$INSTALL_OPTIONAL" = "true" ] || [ "$INSTALL_PRODUCTION" = "true" ]; then apt-get update && apt-get install -y --no-install-recommends build-essential && rm -rf /var/lib/apt/lists/*; fi
|
|
15
|
+
|
|
16
|
+
COPY README.md pyproject.toml requirements.txt requirements-optional.txt ./
|
|
17
|
+
RUN pip install --no-cache-dir -r requirements.txt \
|
|
18
|
+
&& if [ "$INSTALL_OPTIONAL" = "true" ]; then pip install --no-cache-dir -r requirements-optional.txt; fi \
|
|
19
|
+
&& if [ "$INSTALL_OTEL" = "true" ]; then pip install --no-cache-dir "opentelemetry-api>=1.25" "opentelemetry-sdk>=1.25" "opentelemetry-exporter-otlp>=1.25" "opentelemetry-instrumentation-fastapi>=0.46b0"; fi
|
|
20
|
+
|
|
21
|
+
COPY wavemind ./wavemind
|
|
22
|
+
COPY wavemind_v2.py ./wavemind_v2.py
|
|
23
|
+
RUN if [ "$INSTALL_PRODUCTION" = "true" ]; then pip install --no-cache-dir -e ".[production]"; else pip install --no-cache-dir -e .; fi
|
|
24
|
+
|
|
25
|
+
VOLUME ["/data", "/backups"]
|
|
26
|
+
EXPOSE 8000
|
|
27
|
+
|
|
28
|
+
CMD ["uvicorn", "wavemind.api:create_app", "--factory", "--host", "0.0.0.0", "--port", "8000"]
|
|
@@ -14,13 +14,18 @@ include docs/RELEASE.md
|
|
|
14
14
|
include docs/PROJECT_BOARD.md
|
|
15
15
|
include docs/DEMO_SCRIPT.md
|
|
16
16
|
include docs/LAUNCH_KIT.md
|
|
17
|
+
include docs/BENCHMARK_BRIEF.md
|
|
17
18
|
include docs/CHROMA_MIGRATION.md
|
|
19
|
+
include docs/OBSERVABILITY.md
|
|
18
20
|
include docs/RU_LAUNCH_POSTS.md
|
|
19
21
|
include docs/USE_CASES.md
|
|
20
22
|
include docs/assets/benchmark-summary.svg
|
|
21
23
|
include docs/assets/wavemind-social-card.svg
|
|
24
|
+
include docs/assets/wavemind-demo.gif
|
|
22
25
|
include benchmarks/*.py
|
|
23
26
|
include benchmarks/*.json
|
|
24
27
|
include benchmarks/*.md
|
|
25
28
|
include examples/*.py
|
|
29
|
+
recursive-include examples/observability *
|
|
30
|
+
recursive-include examples/production-index-profile *
|
|
26
31
|
prune benchmarks/data
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: wavemind
|
|
3
|
-
Version: 2.2.
|
|
3
|
+
Version: 2.2.2
|
|
4
4
|
Summary: Local-first dynamic memory field with vector search and wave-field re-ranking
|
|
5
5
|
License-Expression: MIT
|
|
6
6
|
Project-URL: Homepage, https://github.com/CaspianG/wavemind
|
|
@@ -66,6 +66,8 @@ users or projects isolated.
|
|
|
66
66
|
|
|
67
67
|
<img src="https://raw.githubusercontent.com/CaspianG/wavemind/main/docs/assets/wavemind-social-card.svg" alt="WaveMind dynamic memory overview" width="820">
|
|
68
68
|
|
|
69
|
+
<img src="https://raw.githubusercontent.com/CaspianG/wavemind/main/docs/assets/wavemind-demo.gif" alt="WaveMind dynamic memory terminal demo" width="820">
|
|
70
|
+
|
|
69
71
|
[Quick Start](#quick-start) |
|
|
70
72
|
[CLI](#cli-cheat-sheet) |
|
|
71
73
|
[Studio](#wavemind-studio) |
|
|
@@ -77,6 +79,8 @@ users or projects isolated.
|
|
|
77
79
|
[Use Cases](docs/USE_CASES.md) |
|
|
78
80
|
[HTTP API](#http-api) |
|
|
79
81
|
[Benchmarks](#benchmark) |
|
|
82
|
+
[Benchmark Brief](docs/BENCHMARK_BRIEF.md) |
|
|
83
|
+
[Research Branches](#research-branches) |
|
|
80
84
|
[Roadmap](#roadmap) |
|
|
81
85
|
[Contributing](#contributing) |
|
|
82
86
|
[Limitations](#known-limitations)
|
|
@@ -159,6 +163,7 @@ Start here if you only want to use WaveMind from the terminal:
|
|
|
159
163
|
| Show first-run help | `wavemind quickstart` |
|
|
160
164
|
| Store a memory | `wavemind remember "Andrey prefers short answers" --namespace user:42` |
|
|
161
165
|
| Search memory | `wavemind query "answer style" --namespace user:42` |
|
|
166
|
+
| Consolidate active patterns | `wavemind consolidate --namespace user:42 --seed "Rust compiler systems"` |
|
|
162
167
|
| Open local dashboard | `wavemind studio` |
|
|
163
168
|
| See stored state | `wavemind stats --namespace user:42` |
|
|
164
169
|
| Delete a namespace | `wavemind forget --namespace user:42` |
|
|
@@ -272,11 +277,12 @@ wavemind --db ./state/app_memory.sqlite3 query "answer style" --namespace user:4
|
|
|
272
277
|
| CrewAI or AutoGen loop | The adapters in `wavemind.integrations` |
|
|
273
278
|
| Node, Go, Ruby, PHP, or no-code app | `wavemind serve` and the HTTP API |
|
|
274
279
|
| Personal knowledge base | Store notes by project namespace and query locally |
|
|
275
|
-
| Support or CRM workflow |
|
|
276
|
-
| Research or
|
|
280
|
+
| Support or CRM workflow | Customer issues, resolutions, preferences, corrections, TTL, and namespace isolation. See [`examples/customer_support_memory.py`](examples/customer_support_memory.py). |
|
|
281
|
+
| Research or analyst notebook | Findings, hypotheses, decisions, source metadata, TTL, and project isolation. See [`examples/research_notebook_memory.py`](examples/research_notebook_memory.py). |
|
|
277
282
|
|
|
278
283
|
For migrations from existing local vector memory, start with
|
|
279
|
-
[`docs/CHROMA_MIGRATION.md`](docs/CHROMA_MIGRATION.md).
|
|
284
|
+
[`docs/CHROMA_MIGRATION.md`](docs/CHROMA_MIGRATION.md). The guide has a tested
|
|
285
|
+
offline fixture at [`examples/chroma_migration.py`](examples/chroma_migration.py).
|
|
280
286
|
|
|
281
287
|
## Minimal Agent Loop
|
|
282
288
|
|
|
@@ -322,6 +328,24 @@ python examples/dynamic_memory_demo.py
|
|
|
322
328
|
That demo shows corrected facts outranking stale facts, temporary memory
|
|
323
329
|
expiring, namespace isolation, and index-health reporting.
|
|
324
330
|
|
|
331
|
+
To see the same behavior in a practical support/CRM workflow:
|
|
332
|
+
|
|
333
|
+
```sh
|
|
334
|
+
python examples/customer_support_memory.py
|
|
335
|
+
```
|
|
336
|
+
|
|
337
|
+
That demo stores customer preferences, billing tickets, stale CRM data,
|
|
338
|
+
temporary discount codes, and separate customer namespaces.
|
|
339
|
+
|
|
340
|
+
To see source-aware research memory:
|
|
341
|
+
|
|
342
|
+
```sh
|
|
343
|
+
python examples/research_notebook_memory.py
|
|
344
|
+
```
|
|
345
|
+
|
|
346
|
+
That demo stores analyst findings, temporary hypotheses, decisions, source
|
|
347
|
+
metadata, and isolated project namespaces.
|
|
348
|
+
|
|
325
349
|
## How The Memory Field Works
|
|
326
350
|
|
|
327
351
|
```mermaid
|
|
@@ -335,6 +359,8 @@ flowchart LR
|
|
|
335
359
|
R --> P["app, search UI, prompt, API, or tool"]
|
|
336
360
|
P --> F["recall feedback updates hotness / priority"]
|
|
337
361
|
F --> D
|
|
362
|
+
F --> C["consolidate active clusters"]
|
|
363
|
+
C --> D
|
|
338
364
|
```
|
|
339
365
|
|
|
340
366
|
The wave field is the dynamic layer around stored memories. It is not a
|
|
@@ -350,12 +376,19 @@ memories should still matter.
|
|
|
350
376
|
| TTL | This fact is temporary. | Drops out after expiry. |
|
|
351
377
|
| namespace and tags | This belongs to one user/project/type. | Prevents cross-user or cross-topic leakage. |
|
|
352
378
|
| graph dynamics | Related memories can excite or inhibit each other. | Helps clusters and corrections behave like memory, not a flat list. |
|
|
379
|
+
| consolidation | Active clusters can become durable concept memories. | Turns repeated patterns into inspectable higher-level memories with provenance. |
|
|
353
380
|
|
|
354
381
|
Technically, the current `MemoryFieldGraph` is a discrete graph over stored
|
|
355
382
|
memories, not a continuous mathematical physics field. That honesty matters:
|
|
356
383
|
WaveMind is useful today as a dynamic memory engine, while the research path is
|
|
357
384
|
to make the field dynamics more explicit, measurable, and scalable.
|
|
358
385
|
|
|
386
|
+
Self-organization is now part of the core surface. `consolidate_concepts()`,
|
|
387
|
+
`wavemind consolidate`, and `POST /consolidate` can turn an active graph cluster
|
|
388
|
+
into a new stored memory such as `Consolidated memory: systems...` without an
|
|
389
|
+
LLM call. The generated memory keeps the source memory ids in metadata, so it is
|
|
390
|
+
auditable instead of being a hidden summary.
|
|
391
|
+
|
|
359
392
|
## Optional Embeddings
|
|
360
393
|
|
|
361
394
|
For sentence-transformer embeddings:
|
|
@@ -418,6 +451,10 @@ Optional pgvector environment variables:
|
|
|
418
451
|
- `WAVEMIND_PGVECTOR_COLLECTION` - collection key, default `default`.
|
|
419
452
|
- `WAVEMIND_PGVECTOR_CREATE_HNSW=1` - create an HNSW index using
|
|
420
453
|
`vector_cosine_ops` when the installed pgvector version supports it.
|
|
454
|
+
- `WAVEMIND_PGVECTOR_HNSW_M` - optional HNSW graph degree for index creation.
|
|
455
|
+
- `WAVEMIND_PGVECTOR_HNSW_EF_CONSTRUCTION` - optional HNSW build accuracy setting.
|
|
456
|
+
- `WAVEMIND_PGVECTOR_EF_SEARCH` - optional per-query HNSW search depth. Increase
|
|
457
|
+
it when pgvector is fast but recall is too low.
|
|
421
458
|
|
|
422
459
|
If `WAVEMIND_PGVECTOR_DSN` is missing, WaveMind raises a clear error instead of
|
|
423
460
|
silently falling back to another index backend.
|
|
@@ -438,6 +475,125 @@ production latency and durability should be measured against a real Qdrant
|
|
|
438
475
|
service. If `WAVEMIND_QDRANT_URL` is missing, WaveMind raises a clear error
|
|
439
476
|
instead of silently falling back to another backend.
|
|
440
477
|
|
|
478
|
+
## Scale Readiness
|
|
479
|
+
|
|
480
|
+
WaveMind now includes an explicit scale preflight:
|
|
481
|
+
|
|
482
|
+
```sh
|
|
483
|
+
wavemind scale-plan --target-memories 50000
|
|
484
|
+
```
|
|
485
|
+
|
|
486
|
+
For JSON output in CI or deployment checks:
|
|
487
|
+
|
|
488
|
+
```sh
|
|
489
|
+
wavemind --db ./state/wavemind.sqlite3 scale-plan --target-memories 50000 --json
|
|
490
|
+
```
|
|
491
|
+
|
|
492
|
+
To fail a deployment preflight when the plan needs action:
|
|
493
|
+
|
|
494
|
+
```sh
|
|
495
|
+
wavemind --db ./state/wavemind.sqlite3 scale-plan --target-memories 50000 --fail-on action_required --json
|
|
496
|
+
```
|
|
497
|
+
|
|
498
|
+
If you only want a plan for a future size without loading optional index
|
|
499
|
+
packages:
|
|
500
|
+
|
|
501
|
+
```sh
|
|
502
|
+
wavemind --index faiss scale-plan --current-memories 10000 --target-memories 50000 --json
|
|
503
|
+
```
|
|
504
|
+
|
|
505
|
+
The scale plan reports:
|
|
506
|
+
|
|
507
|
+
| field | meaning |
|
|
508
|
+
|---|---|
|
|
509
|
+
| `tier` | `small`, `medium`, `large-local`, `production-service`, or `million-plus`. |
|
|
510
|
+
| `status` | `ok`, `watch`, `action_required`, or `architecture_required`. |
|
|
511
|
+
| `recommended_index` | The candidate-index class to use before growth. |
|
|
512
|
+
| `warnings` | Why the current path may fail at the target size. |
|
|
513
|
+
| `actions` | Concrete setup, benchmark, rebuild, and index-health steps. |
|
|
514
|
+
|
|
515
|
+
The same scale preflight is available over HTTP:
|
|
516
|
+
|
|
517
|
+
```sh
|
|
518
|
+
curl "http://127.0.0.1:8000/scale-plan?target_memories=50000"
|
|
519
|
+
```
|
|
520
|
+
|
|
521
|
+
Rule of thumb:
|
|
522
|
+
|
|
523
|
+
| target memories | recommended path |
|
|
524
|
+
|---:|---|
|
|
525
|
+
| up to 1000 | SQLite + NumPy exact index. |
|
|
526
|
+
| 1000 to 5000 | NumPy can work, but benchmark real queries. |
|
|
527
|
+
| 5000 to 50000 | Persisted FAISS for local single-node, or Qdrant service. |
|
|
528
|
+
| 50000 to 1M | Service-backed candidate index, namespace sharding, measured p95/p99. |
|
|
529
|
+
| above 1M | External vector database plus WaveMind as the memory-policy layer. |
|
|
530
|
+
|
|
531
|
+
Scale readiness profile:
|
|
532
|
+
|
|
533
|
+
```sh
|
|
534
|
+
python benchmarks/scale_readiness_benchmark.py --simulated-memories 1000000
|
|
535
|
+
```
|
|
536
|
+
|
|
537
|
+
Checked-in result:
|
|
538
|
+
|
|
539
|
+
| profile | result |
|
|
540
|
+
|---|---:|
|
|
541
|
+
| Cluster planner | 4096 namespaces, 4 nodes, replication factor 2, single-node loss availability `1.000`. |
|
|
542
|
+
| Hot cache | 2000 lookups, hit rate `0.920`, p99 lookup `0.01 ms`. |
|
|
543
|
+
| Structured payloads | image/audio/table/event retrieval, precision@1 `1.000`, p99 `1.27 ms`. |
|
|
544
|
+
|
|
545
|
+
This profile validates routing, cache behavior, and structured payload handling.
|
|
546
|
+
It is not a 10M-vector load test. Real 100k, 1M, and 10M latency claims should
|
|
547
|
+
come from service-backed FAISS/Qdrant/pgvector load tests on production-like
|
|
548
|
+
hardware.
|
|
549
|
+
|
|
550
|
+
Cluster placement planning:
|
|
551
|
+
|
|
552
|
+
```sh
|
|
553
|
+
wavemind cluster-plan \
|
|
554
|
+
--namespace-count 4096 \
|
|
555
|
+
--node node-a=10.0.0.1:8000 \
|
|
556
|
+
--node node-b=10.0.0.2:8000 \
|
|
557
|
+
--node node-c=10.0.0.3:8000 \
|
|
558
|
+
--replication-factor 2 \
|
|
559
|
+
--kubernetes \
|
|
560
|
+
--json
|
|
561
|
+
```
|
|
562
|
+
|
|
563
|
+
This uses deterministic rendezvous placement so each namespace has a primary
|
|
564
|
+
and replica set. The emitted Kubernetes StatefulSet manifest is a deployment
|
|
565
|
+
starting point; it does not claim Raft consensus or automatic distributed
|
|
566
|
+
SQLite writes.
|
|
567
|
+
|
|
568
|
+
The same planner is available over HTTP as `POST /cluster-plan`.
|
|
569
|
+
|
|
570
|
+
## Structured And Multimodal Memory
|
|
571
|
+
|
|
572
|
+
WaveMind can store non-text memories as structured text plus metadata. This is
|
|
573
|
+
useful for product events, tables, call transcripts, and image/audio captions
|
|
574
|
+
while keeping the same query API.
|
|
575
|
+
|
|
576
|
+
```python
|
|
577
|
+
from wavemind import WaveMind, image_payload, remember_payload
|
|
578
|
+
|
|
579
|
+
memory = WaveMind()
|
|
580
|
+
remember_payload(
|
|
581
|
+
memory,
|
|
582
|
+
image_payload("s3://demo/chart.png", caption="enterprise revenue expansion chart"),
|
|
583
|
+
namespace="research",
|
|
584
|
+
)
|
|
585
|
+
print(memory.query("enterprise expansion chart", namespace="research")[0].metadata)
|
|
586
|
+
```
|
|
587
|
+
|
|
588
|
+
Supported payload helpers:
|
|
589
|
+
|
|
590
|
+
| helper | use case |
|
|
591
|
+
|---|---|
|
|
592
|
+
| `image_payload()` | image URI plus caption or alt text |
|
|
593
|
+
| `audio_payload()` | audio URI plus transcript or summary |
|
|
594
|
+
| `table_payload()` | compact table preview with row count |
|
|
595
|
+
| `event_payload()` | structured product, user, or system event |
|
|
596
|
+
|
|
441
597
|
## Storage Backends
|
|
442
598
|
|
|
443
599
|
SQLite is the default source of truth. For multi-tenant production deployments,
|
|
@@ -512,18 +668,23 @@ curl http://127.0.0.1:8000/audit?namespace=demo
|
|
|
512
668
|
curl http://127.0.0.1:8000/metrics
|
|
513
669
|
curl http://127.0.0.1:8000/observability
|
|
514
670
|
curl http://127.0.0.1:8000/index/health
|
|
671
|
+
curl "http://127.0.0.1:8000/scale-plan?target_memories=50000"
|
|
515
672
|
curl -X POST http://127.0.0.1:8000/index/rebuild
|
|
673
|
+
curl -X POST http://127.0.0.1:8000/consolidate -H "Content-Type: application/json" -d '{"namespace":"demo","seed_text":"Rust compiler systems","min_energy":0.01}'
|
|
516
674
|
curl -X POST http://127.0.0.1:8000/backup -H "Content-Type: application/json" -d '{"path":"./backups","keep_last":7}'
|
|
517
675
|
```
|
|
518
676
|
|
|
519
677
|
`/audit` returns mutation events such as `remember`, `forget`, `backup`, and
|
|
520
|
-
`
|
|
678
|
+
`consolidate_concept`. Query audit is opt-in with `WAVEMIND_AUDIT_QUERIES=1` because
|
|
521
679
|
writing an audit row for every query changes latency. `/metrics` returns a
|
|
522
680
|
Prometheus-compatible text payload without adding a required dependency.
|
|
523
681
|
`/index/health` reports source-of-truth versus candidate-index consistency.
|
|
524
682
|
`/index/rebuild` rebuilds the candidate index from stored active memories and
|
|
525
683
|
logs an `index_rebuild` audit event.
|
|
526
684
|
|
|
685
|
+
Full observability guide and local Prometheus/OTEL examples:
|
|
686
|
+
[`docs/OBSERVABILITY.md`](docs/OBSERVABILITY.md).
|
|
687
|
+
|
|
527
688
|
OpenTelemetry traces are optional and off by default:
|
|
528
689
|
|
|
529
690
|
```sh
|
|
@@ -642,11 +803,17 @@ Framework examples in this repository:
|
|
|
642
803
|
| LangChain memory | `examples/langchain_memory.py` |
|
|
643
804
|
| OpenAI/OpenRouter-style agent loop | `examples/agent_with_memory.py` |
|
|
644
805
|
| LangGraph hooks | `wavemind.integrations.langgraph`, `examples/framework_integrations.py` |
|
|
645
|
-
| LlamaIndex-style retriever | `wavemind.integrations.llamaindex`, `examples/
|
|
806
|
+
| LlamaIndex-style retriever | `wavemind.integrations.llamaindex`, `examples/llamaindex_retriever.py` |
|
|
646
807
|
| CrewAI-style tools | `wavemind.integrations.crewai`, `examples/framework_integrations.py` |
|
|
647
808
|
| AutoGen-style hooks | `wavemind.integrations.autogen`, `examples/framework_integrations.py` |
|
|
648
809
|
| Namespace sharding | `examples/sharded_memory.py` |
|
|
649
810
|
|
|
811
|
+
Run the dedicated offline LlamaIndex-style retriever example:
|
|
812
|
+
|
|
813
|
+
```sh
|
|
814
|
+
python examples/llamaindex_retriever.py
|
|
815
|
+
```
|
|
816
|
+
|
|
650
817
|
## OpenClaw Integration
|
|
651
818
|
|
|
652
819
|
[OpenClaw memory](https://docs.openclaw.ai/concepts/memory) is file-centered:
|
|
@@ -782,6 +949,18 @@ memory benchmark:
|
|
|
782
949
|
In short: static vector search answers "what is nearest?" Dynamic memory also
|
|
783
950
|
asks "what is still relevant, reinforced, scoped, and allowed to be remembered?"
|
|
784
951
|
|
|
952
|
+
## Research Branches
|
|
953
|
+
|
|
954
|
+
The main branch stays focused on the core WaveMind library: dynamic memory,
|
|
955
|
+
storage, indexes, APIs, integrations, and public memory benchmarks.
|
|
956
|
+
|
|
957
|
+
Experimental domains live in separate branches so they can move quickly without
|
|
958
|
+
overloading the main README:
|
|
959
|
+
|
|
960
|
+
| Branch | Scope |
|
|
961
|
+
|---|---|
|
|
962
|
+
| [`research/crypto-pattern-memory`](https://github.com/CaspianG/wavemind/tree/research/crypto-pattern-memory) | OHLCV pattern-memory research, historical analogue retrieval, and future backtest experiments. |
|
|
963
|
+
|
|
785
964
|
## Benchmark
|
|
786
965
|
|
|
787
966
|
WaveMind tracks benchmarks in two layers:
|
|
@@ -791,6 +970,7 @@ WaveMind tracks benchmarks in two layers:
|
|
|
791
970
|
|
|
792
971
|
Machine-readable benchmark matrix: `benchmarks/benchmark_matrix_results.json`.
|
|
793
972
|
Full generated benchmark report: [`benchmarks/BENCHMARK_REPORT.md`](benchmarks/BENCHMARK_REPORT.md).
|
|
973
|
+
Compact benchmark leaderboard: [`benchmarks/BENCHMARK_LEADERBOARD.md`](benchmarks/BENCHMARK_LEADERBOARD.md).
|
|
794
974
|
|
|
795
975
|
Visual summary generated from the checked-in JSON results:
|
|
796
976
|
|
|
@@ -828,14 +1008,19 @@ Current read:
|
|
|
828
1008
|
|---|---|---|
|
|
829
1009
|
| Public agent-memory evidence | On official LoCoMo `locomo10.json`, WaveMind reaches `evidence_recall@5 0.386` with hash embeddings and `0.547` with sentence-transformers. Fair namespace-filtered Chroma reaches `0.257` / `0.407`; Qdrant reaches `0.263` / `0.409`. | WaveMind retrieves more labeled evidence. Chroma is still the fastest static vector-store baseline. Qdrant local payload filtering is much slower than service-mode Qdrant should be. |
|
|
830
1010
|
| Public retrieval sanity check | On BEIR SciFact, WaveMind reaches `nDCG@10 0.354`, `Recall@10 0.482`; Qdrant matches that quality; Chroma reaches `0.350` / `0.467` with identical hash embeddings. | Same-embedding retrieval quality is close. Chroma is fastest at `1.79 ms`; Qdrant local is `17.71 ms`; WaveMind exact path is `117.02 ms`. |
|
|
1011
|
+
| Public multilingual retrieval | On NoMIRACL Russian, sampled at 200 queries / 5000 compact candidate passages, WaveMind reaches `nDCG@10 0.434`, `Recall@10 0.516`, matching Qdrant and staying within `0.002` nDCG of Chroma on identical hash embeddings. | Russian same-embedding quality is at parity. Chroma is faster at `2.60 ms`; WaveMind is `10.22 ms`; Qdrant local is `18.86 ms`. |
|
|
831
1012
|
| Static agent recall | WaveMind `precision@1` equals Chroma at `0.82`; WaveMind `precision@3` is `0.90` vs Chroma `0.88`. | Competitive quality, but Chroma is faster on the static vector-store path. |
|
|
832
1013
|
| Dynamic memory policy | WaveMind reaches `1.00` stale suppression; Chroma static is `0.00`. | This is the strongest current differentiation: hotness, TTL, corrections, and namespaces. |
|
|
833
|
-
| Field memory dynamics | Graph-enabled WaveMind reaches `1.00` `precision@1`, `1.00` stale suppression, and `1.00` concept
|
|
1014
|
+
| Field memory dynamics | Graph-enabled WaveMind reaches `1.00` `precision@1`, `1.00` stale suppression, `1.00` concept formation, and `1.00` durable concept consolidation vs static WaveMind at `0.20` / `0.20` / `0.00` / `0.00`. | This is still synthetic, but it is now a regression check for memory-to-memory excitation, conflict inhibition, decay, and self-organization into auditable concept memories. |
|
|
834
1015
|
| Long-term evidence | WaveMind reaches `1.00` evidence recall@5, `1.00` precision@1, and `1.00` stale suppression on the synthetic long-memory evidence benchmark. | This is the first proof-shaped benchmark for agent memory: it measures whether stale/corrected/expired/cross-user facts stay out of retrieved evidence. |
|
|
835
1016
|
| Capacity | Static `precision@1` is `0.94` at 5000 memories; dynamic policy keeps `1.00` on the current checks. | Quality is holding on these checks, but dynamic latency must be optimized. |
|
|
836
1017
|
| LongMemEval full retrieval | On the official LongMemEval-S cleaned file, 470 non-abstention session-level questions, WaveMind reaches `evidence_recall@5 0.782` and `precision@1 0.696`; Chroma static reaches `0.518` / `0.355`; Qdrant static reaches `0.520` / `0.355`. | This is now the strongest public memory result in the repo. It is retrieval-only, not final answer quality. |
|
|
1018
|
+
| LongMemEval 50-query smoke | On the first 50 non-abstention LongMemEval-S questions, WaveMind reaches `evidence_recall@5 0.920`, `precision@1 0.760`, and `MRR@5 0.827`; Chroma/Qdrant static reach `0.600`, `0.260`, and `0.385`. | This is the fast regression profile for checking current changes before rerunning the full LongMemEval profile. WaveMind wins on quality; latency still needs work. |
|
|
837
1019
|
| ANN/index curve | At 50000 generated 128-d vectors, NumPy exact keeps `recall@10 1.000` at `6.49 ms`; quantized int8 keeps `0.934` at `24.92 ms`; Annoy is faster at `4.92 ms` but drops to `0.730` recall; Qdrant local keeps `1.000` recall at `43.49 ms`. | Current local scale boundary is clear: quantized search needs kernel work, Annoy needs tuning/FAISS, and Qdrant should be tested in service mode for a fair production comparison. |
|
|
838
|
-
|
|
|
1020
|
+
| Production load | At 100000 generated 128-d vectors, service-mode Qdrant reaches `recall@10 1.000`, avg `10.76 ms`; pgvector HNSW reaches `0.736`, avg `17.76 ms`; at 1M vectors Qdrant reaches `0.506`, avg `45.81 ms`. | Qdrant service is already usable at 100k. The 1M result is not production-grade yet: large-N service settings need tuning before claiming million-memory recall. |
|
|
1021
|
+
| Scale readiness | Deterministic 1M-memory simulation validates 4096 namespace placements over 4 nodes with replication factor 2, single-node-loss availability `1.000`, hot-cache hit rate `0.920`, and structured payload precision@1 `1.000`. | This proves routing/cache/payload foundations, not a 10M-vector load-test claim. Real 100k-10M production latency needs service-backed load tests. |
|
|
1022
|
+
| Memory competitor adapters | WaveMind reaches `precision@1 0.80`, `precision@3 1.00`, stale suppression `1.00` on the small adapter profile. Mem0, Zep, and LangGraph are listed as skipped unless their real packages/services are configured. | This prevents fake competitor claims. The adapter harness is ready; real Mem0/Zep/LangGraph results still need configured installs. |
|
|
1023
|
+
| LongMemEval local answer generation | With the same local Ollama `qwen2.5:1.5b`, WaveMind reaches `exact_match 0.240`, `contains_answer 0.380`, `token_f1 0.333`, and `evidence_recall@5 0.920`; Chroma and Qdrant static both reach `0.120`, `0.160`, `0.170`, and `0.600`. | This is the first checked-in end-to-end answer benchmark against Chroma/Qdrant. It is still a 50-question lightweight smoke run, not a full LongMemEval leaderboard score. |
|
|
839
1024
|
|
|
840
1025
|
### Real Benchmark Matrix
|
|
841
1026
|
|
|
@@ -843,17 +1028,22 @@ Current read:
|
|
|
843
1028
|
|---|---|---|---|---|
|
|
844
1029
|
| Agent user-memory retrieval | Natural-language recall over 200 user facts. | implemented | Chroma | Match Chroma `precision@1`, beat `precision@3`, stay under 5 ms at 200 memories. |
|
|
845
1030
|
| Dynamic memory policy | Hot memory, TTL, corrections, stale suppression, namespace isolation. | implemented | Chroma static | Keep `precision@1` and stale suppression at 1.00, cut avg latency below 10 ms at 1000 memories. |
|
|
846
|
-
| Field memory graph dynamics | Related memories excite each other, newer conflicting memories suppress stale facts, graph energy decays, and active clusters
|
|
1031
|
+
| Field memory graph dynamics | Related memories excite each other, newer conflicting memories suppress stale facts, graph energy decays, and active clusters can become durable concept memories. | implemented | WaveMind static | Keep `precision@1`, stale suppression, concept formation, and concept consolidation at 1.00 while moving from synthetic checks to LoCoMo/LongMemEval evidence. |
|
|
847
1032
|
| WaveMind capacity curve | How recall and latency change at 200 / 1000 / 5000 memories. | implemented | WaveMind-only today | Keep `precision@1 >= 0.95` at 5000 memories and dynamic latency below 20 ms. |
|
|
848
1033
|
| Long-term memory evidence | Evidence retrieval from long histories with profile, preference, correction, TTL, namespace, and filler noise. | implemented | Static vector / Chroma / Qdrant | Keep this as a small regression test while public LoCoMo and LongMemEval runners carry the stronger evidence claims. |
|
|
849
1034
|
| BEIR-style open retrieval runner | Public `corpus.jsonl`, `queries.jsonl`, `qrels/*.tsv` datasets with the same metrics for each engine. | implemented | WaveMind / Chroma / Qdrant | Use identical embeddings and report `nDCG@k`, `Recall@k`, `MRR@k`, `precision@1`, and latency. Current checked-in run: BEIR SciFact. |
|
|
1035
|
+
| NoMIRACL Russian retrieval | Russian human-annotated multilingual relevance over compact candidate passages. | implemented | WaveMind / Chroma / Qdrant | Keep same-embedding `nDCG@10` at parity, then rerun with sentence-transformers and full MIRACL Russian when disk/service capacity allows it. |
|
|
850
1036
|
| ANN/VectorDBBench-style local curve | Recall/latency tradeoff for candidate indexes on generated vectors. | implemented | NumPy exact / quantized int8 / Annoy / Qdrant local | Use this as the local engineering curve; official VectorDBBench remains future work. |
|
|
1037
|
+
| Production index profile | Docker-backed 50000-vector profile for persisted FAISS, Qdrant service, and PostgreSQL/pgvector HNSW. | implemented | FAISS / Qdrant service / pgvector | Keep service-mode candidate generation above `0.95` recall@10 and below 10 ms average query latency at 50000 vectors. |
|
|
1038
|
+
| Production load profile | 100k and 1M service-backed candidate-index checks. | implemented | Qdrant service / pgvector HNSW / FAISS persisted | 100k Qdrant is strong; 1M Qdrant and pgvector require tuning before production claims. |
|
|
1039
|
+
| Scale readiness profile | Cluster placement, single-node-loss simulation, hot-cache behavior, and structured/multimodal payload retrieval. | implemented | Mem0 / Zep / LangGraph persistent memory / GraphRAG target adapters | Use this as production foundation proof before real distributed 100k, 1M, and 10M load tests. |
|
|
1040
|
+
| Memory competitor adapter profile | Dynamic-memory scenario wired for external memory frameworks. | implemented | Mem0 / Zep / LangGraph persistent memory | Report real competitor results only when their packages/services are explicitly configured. |
|
|
851
1041
|
| [BEIR](https://github.com/beir-cellar/beir) | Standard zero-shot information retrieval quality. | planned | Chroma / Qdrant / FAISS | Stay within 0.02 `nDCG@10` on identical embeddings. |
|
|
852
1042
|
| [MTEB Retrieval](https://github.com/embeddings-benchmark/mteb) | Separates encoder quality from retrieval-store quality. | planned | Chroma / Qdrant / FAISS | Prove WaveMind does not reduce same-embedding retrieval quality. |
|
|
853
|
-
| [MIRACL Russian](https://miracl.ai/) | Multilingual retrieval with Russian relevance judgments. |
|
|
1043
|
+
| [MIRACL Russian](https://miracl.ai/) | Multilingual retrieval with Russian relevance judgments. | runner ready | Chroma / Qdrant / FAISS | NoMIRACL Russian compact run is implemented; full-corpus MIRACL Russian remains the next heavier profile. |
|
|
854
1044
|
| [VectorDBBench](https://github.com/zilliztech/VectorDBBench) | Vector database insertion/search/filter/cost-performance benchmark. | planned | Chroma / Qdrant / Milvus / Weaviate / Pinecone / FAISS | Use only after WaveMind has a production index path; today it is a memory layer, not a standalone cloud vector DB. |
|
|
855
1045
|
| [LoCoMo](https://arxiv.org/abs/2402.17753) | Long conversation memory, temporal consistency, multi-hop recall. Retrieval-only runner is implemented for official `locomo10.json`. | implemented | Static vector / Chroma / Qdrant | Improve answer generation accuracy on top of the stronger sentence-transformers evidence retrieval run. |
|
|
856
|
-
| [LongMemEval](https://arxiv.org/abs/2410.10813) | Long-term assistant memory with updates and abstention. | implemented retrieval
|
|
1046
|
+
| [LongMemEval](https://arxiv.org/abs/2410.10813) | Long-term assistant memory with updates and abstention. | implemented retrieval + local Ollama answer smoke | Static vector / Chroma / Qdrant / Mem0-style memory | Add stronger LLM answer quality, abstention, and Chroma/Qdrant RAG answer baselines. |
|
|
857
1047
|
| [LongMemEval-V2](https://arxiv.org/abs/2605.12493) | Web-agent memory: state recall, dynamic state, workflow gotchas. | planned | AgentRunbook-R / Chroma RAG / Qdrant RAG | Prove WaveMind can retrieve compact evidence from agent trajectories. |
|
|
858
1048
|
| [LMEB](https://github.com/KaLM-Embedding/LMEB) | Long-horizon memory embedding tasks beyond normal passage retrieval. | planned | Embedding-only baselines / Chroma / Qdrant | Choose the default semantic encoder using memory-specific tasks. |
|
|
859
1049
|
| [RAGBench](https://huggingface.co/datasets/rungalileo/ragbench) | Downstream RAG context and answer quality. | planned | Chroma RAG / Qdrant RAG / Pinecone RAG | Show whether stale-memory suppression improves context relevance. |
|
|
@@ -899,6 +1089,36 @@ Qdrant local preserves the same ranking quality and is much faster than the
|
|
|
899
1089
|
WaveMind NumPy exact path. The engineering target is a FAISS/Annoy candidate
|
|
900
1090
|
index with WaveMind's dynamic field policy applied only as a top-k re-ranker.
|
|
901
1091
|
|
|
1092
|
+
### NoMIRACL Russian Retrieval
|
|
1093
|
+
|
|
1094
|
+
WaveMind includes a compact multilingual retrieval runner for
|
|
1095
|
+
[NoMIRACL](https://huggingface.co/datasets/miracl/nomiracl), the negative-aware
|
|
1096
|
+
MIRACL relevance dataset. The checked-in run uses Russian `test.relevant`
|
|
1097
|
+
queries and the compact Russian candidate corpus. It is not a full-corpus
|
|
1098
|
+
MIRACL run; it is a reproducible multilingual relevance benchmark small enough
|
|
1099
|
+
to run on a local machine.
|
|
1100
|
+
|
|
1101
|
+
```sh
|
|
1102
|
+
python benchmarks/nomiracl_russian_benchmark.py --download --dataset benchmarks/data/nomiracl-russian --engines wavemind chroma qdrant --top-k 10 --limit-queries 200 --limit-corpus 5000 --output benchmarks/nomiracl_russian_results.json
|
|
1103
|
+
```
|
|
1104
|
+
|
|
1105
|
+
Checked-in NoMIRACL Russian result:
|
|
1106
|
+
|
|
1107
|
+
200 Russian queries, 5000 compact candidate passages,
|
|
1108
|
+
`HashingTextEncoder`, top-k 10. Full machine-readable result:
|
|
1109
|
+
`benchmarks/nomiracl_russian_results.json`.
|
|
1110
|
+
|
|
1111
|
+
| engine | nDCG@10 | Recall@10 | MRR@10 | precision@1 | avg latency | p95 latency |
|
|
1112
|
+
|---|---:|---:|---:|---:|---:|---:|
|
|
1113
|
+
| WaveMind | 0.434 | 0.516 | 0.489 | 0.410 | 10.22 ms | 15.53 ms |
|
|
1114
|
+
| Chroma | 0.435 | 0.519 | 0.490 | 0.410 | 2.60 ms | 3.44 ms |
|
|
1115
|
+
| Qdrant | 0.434 | 0.516 | 0.489 | 0.410 | 18.86 ms | 24.08 ms |
|
|
1116
|
+
|
|
1117
|
+
Read this as multilingual same-embedding parity, not as a claim that the hash
|
|
1118
|
+
encoder is the best Russian semantic model. The next stronger run should use
|
|
1119
|
+
`sentence-transformers` on the same NoMIRACL split, then full MIRACL Russian
|
|
1120
|
+
when there is enough disk/service capacity.
|
|
1121
|
+
|
|
902
1122
|
### LoCoMo Evidence Retrieval
|
|
903
1123
|
|
|
904
1124
|
WaveMind now includes a retrieval-only runner for the public
|
|
@@ -1011,18 +1231,35 @@ result: `benchmarks/longmemeval_evidence_results.json`.
|
|
|
1011
1231
|
The Chroma and Qdrant baselines now use the same namespace/payload scope as
|
|
1012
1232
|
WaveMind. Qdrant is run in local embedded mode; the Qdrant client warns that
|
|
1013
1233
|
local mode is not recommended above 20000 points, so this latency should not be
|
|
1014
|
-
read as a service-mode Qdrant result.
|
|
1015
|
-
with a local LLM.
|
|
1234
|
+
read as a service-mode Qdrant result.
|
|
1016
1235
|
|
|
1017
|
-
Answer-generation runner:
|
|
1236
|
+
Answer-generation runner with local Ollama:
|
|
1018
1237
|
|
|
1019
1238
|
```sh
|
|
1020
|
-
python benchmarks/longmemeval_answer_benchmark.py --dataset benchmarks/data/longmemeval_s_cleaned.json --provider ollama --model YOUR_LOCAL_MODEL --top-k 5 --output benchmarks/longmemeval_answer_results.json
|
|
1239
|
+
python benchmarks/longmemeval_answer_benchmark.py --dataset benchmarks/data/longmemeval_s_cleaned.json --provider ollama --model YOUR_LOCAL_MODEL --engines wavemind chroma qdrant --top-k 5 --output benchmarks/longmemeval_answer_results.json
|
|
1021
1240
|
```
|
|
1022
1241
|
|
|
1242
|
+
Checked-in local answer-generation smoke runs:
|
|
1243
|
+
|
|
1244
|
+
50 non-abstention LongMemEval-S questions, compact retrieved evidence,
|
|
1245
|
+
same `HashingTextEncoder`, same local Ollama model, top-k 5. Full machine-readable results:
|
|
1246
|
+
`benchmarks/longmemeval_answer_qwen25_0_5b_50_results.json` and
|
|
1247
|
+
`benchmarks/longmemeval_answer_qwen25_1_5b_50_results.json`.
|
|
1248
|
+
|
|
1249
|
+
| system | questions | evidence recall@5 | exact match | contains answer | token F1 | avg retrieval | avg generation |
|
|
1250
|
+
|---|---:|---:|---:|---:|---:|---:|---:|
|
|
1251
|
+
| WaveMind + Ollama `qwen2.5:0.5b` | 50 | 0.920 | 0.120 | 0.180 | 0.183 | 2.98 ms | 1428.20 ms |
|
|
1252
|
+
| Chroma static + Ollama `qwen2.5:0.5b` | 50 | 0.600 | 0.100 | 0.120 | 0.126 | 4.10 ms | 1234.69 ms |
|
|
1253
|
+
| Qdrant static + Ollama `qwen2.5:0.5b` | 50 | 0.600 | 0.100 | 0.120 | 0.126 | 63.80 ms | 893.48 ms |
|
|
1254
|
+
| WaveMind + Ollama `qwen2.5:1.5b` | 50 | 0.920 | 0.240 | 0.380 | 0.333 | 2.00 ms | 2153.00 ms |
|
|
1255
|
+
| Chroma static + Ollama `qwen2.5:1.5b` | 50 | 0.600 | 0.120 | 0.160 | 0.170 | 7.05 ms | 2082.38 ms |
|
|
1256
|
+
| Qdrant static + Ollama `qwen2.5:1.5b` | 50 | 0.600 | 0.120 | 0.160 | 0.170 | 100.20 ms | 758.11 ms |
|
|
1257
|
+
|
|
1023
1258
|
There is also an extractive smoke run that does not require a model:
|
|
1024
1259
|
`benchmarks/longmemeval_answer_extractive_20_results.json`. It is only a runner
|
|
1025
|
-
check, not a meaningful final answer-quality benchmark.
|
|
1260
|
+
check, not a meaningful final answer-quality benchmark. The Ollama runs are real
|
|
1261
|
+
local LLM runs, but still lightweight smoke results rather than official
|
|
1262
|
+
LongMemEval leaderboard scores.
|
|
1026
1263
|
|
|
1027
1264
|
### ANN Index Curve
|
|
1028
1265
|
|
|
@@ -1040,13 +1277,12 @@ Add `qdrant-service` when `WAVEMIND_QDRANT_URL` points at a running Qdrant
|
|
|
1040
1277
|
service. Add `faiss-persisted` when `WAVEMIND_FAISS_PATH` points at the FAISS
|
|
1041
1278
|
snapshot file to validate persisted-index startup behavior.
|
|
1042
1279
|
|
|
1043
|
-
|
|
1280
|
+
Reproducible Docker production profile:
|
|
1044
1281
|
|
|
1045
1282
|
```sh
|
|
1046
|
-
|
|
1047
|
-
|
|
1048
|
-
|
|
1049
|
-
python benchmarks/ann_index_curve_benchmark.py --sizes 10000 50000 --dim 128 --queries 100 --top-k 10 --engines faiss-persisted qdrant-service pgvector --output benchmarks/production_index_profile_results.json
|
|
1283
|
+
docker compose -f examples/production-index-profile/docker-compose.yml up -d qdrant postgres
|
|
1284
|
+
docker compose -f examples/production-index-profile/docker-compose.yml run --rm benchmark
|
|
1285
|
+
docker compose -f examples/production-index-profile/docker-compose.yml down
|
|
1050
1286
|
```
|
|
1051
1287
|
|
|
1052
1288
|
Checked-in 50000-vector point:
|
|
@@ -1059,15 +1295,62 @@ Checked-in 50000-vector point:
|
|
|
1059
1295
|
| WaveMind faiss | skipped | - | - | - |
|
|
1060
1296
|
| Qdrant local | 1.000 | 43.49 ms | 59.68 ms | 17525.7 ms |
|
|
1061
1297
|
|
|
1298
|
+
Checked-in production 50000-vector point:
|
|
1299
|
+
|
|
1300
|
+
| engine | recall@10 | avg latency | p95 latency | build |
|
|
1301
|
+
|---|---:|---:|---:|---:|
|
|
1302
|
+
| WaveMind faiss-persisted | 1.000 | 3.52 ms | 7.88 ms | 715.9 ms |
|
|
1303
|
+
| Qdrant service | 1.000 | 4.41 ms | 5.93 ms | 12269.8 ms |
|
|
1304
|
+
| WaveMind pgvector | 0.811 | 10.95 ms | 15.69 ms | 185048.9 ms |
|
|
1305
|
+
|
|
1306
|
+
Checked-in production load points:
|
|
1307
|
+
|
|
1308
|
+
```sh
|
|
1309
|
+
python benchmarks/production_load_benchmark.py --sizes 100000 --dim 128 --queries 100 --top-k 10 --engines qdrant-service pgvector faiss-persisted
|
|
1310
|
+
python benchmarks/production_load_benchmark.py --sizes 1000000 --dim 128 --queries 50 --top-k 10 --engines qdrant-service --output benchmarks/production_load_qdrant_1m_results.json
|
|
1311
|
+
```
|
|
1312
|
+
|
|
1313
|
+
| vectors | engine | recall@10 | avg latency | p95 latency | build |
|
|
1314
|
+
|---:|---|---:|---:|---:|---:|
|
|
1315
|
+
| 100000 | Qdrant service | 1.000 | 10.76 ms | 18.78 ms | 39873.2 ms |
|
|
1316
|
+
| 100000 | WaveMind pgvector | 0.736 | 17.76 ms | 23.48 ms | 455703.7 ms |
|
|
1317
|
+
| 100000 | WaveMind faiss-persisted | skipped | - | - | - |
|
|
1318
|
+
| 1000000 | Qdrant service | 0.506 | 45.81 ms | 65.18 ms | 563945.5 ms |
|
|
1319
|
+
|
|
1062
1320
|
Read this as an engineering curve, not an official VectorDBBench result. Annoy
|
|
1063
1321
|
is faster than exact NumPy at 50000 vectors but loses too much recall with the
|
|
1064
1322
|
current settings. The new `quantized` backend compresses vectors and keeps
|
|
1065
1323
|
`0.934` recall@10 on this run, but the current Python/NumPy kernel is slower
|
|
1066
1324
|
than exact NumPy; it is a memory-footprint baseline, not a latency win yet.
|
|
1067
|
-
FAISS persistence
|
|
1068
|
-
|
|
1069
|
-
|
|
1070
|
-
|
|
1325
|
+
FAISS persistence and service-mode Qdrant now both preserve exact recall at
|
|
1326
|
+
50000 generated vectors. The checked-in pgvector/HNSW profile uses
|
|
1327
|
+
`WAVEMIND_PGVECTOR_EF_SEARCH=400`, which improves recall materially but still
|
|
1328
|
+
misses the `0.95` production target and is slower than the other two profiles.
|
|
1329
|
+
The 100k load profile shows Qdrant service is already viable for candidate
|
|
1330
|
+
generation; the 1M Qdrant profile shows that default service settings are not
|
|
1331
|
+
enough for production recall and need HNSW/search tuning before million-memory
|
|
1332
|
+
claims.
|
|
1333
|
+
If a required package, service, or environment variable is missing, the runner
|
|
1334
|
+
marks that engine as `skipped` instead of silently falling back to another
|
|
1335
|
+
backend.
|
|
1336
|
+
|
|
1337
|
+
### Memory Competitor Adapter Profile
|
|
1338
|
+
|
|
1339
|
+
WaveMind includes a small dynamic-memory adapter profile for Mem0, Zep, and
|
|
1340
|
+
LangGraph persistent memory. It checks corrections, TTL, namespace isolation,
|
|
1341
|
+
and preference recall. Missing competitors are marked `skipped` with setup
|
|
1342
|
+
reasons instead of being approximated.
|
|
1343
|
+
|
|
1344
|
+
```sh
|
|
1345
|
+
python benchmarks/memory_competitor_benchmark.py --engines wavemind mem0 zep langgraph
|
|
1346
|
+
```
|
|
1347
|
+
|
|
1348
|
+
| engine | precision@1 | precision@3 | stale suppression | avg latency |
|
|
1349
|
+
|---|---:|---:|---:|---:|
|
|
1350
|
+
| WaveMind | 0.80 | 1.00 | 1.00 | 0.55 ms |
|
|
1351
|
+
| Mem0 | skipped | - | - | - |
|
|
1352
|
+
| Zep | skipped | - | - | - |
|
|
1353
|
+
| LangGraph persistent memory | skipped | - | - | - |
|
|
1071
1354
|
|
|
1072
1355
|
### Current Local Runs
|
|
1073
1356
|
|
|
@@ -1076,13 +1359,13 @@ Field memory dynamics benchmark:
|
|
|
1076
1359
|
13 memories, 5 conflicting-fact queries, deterministic local encoder.
|
|
1077
1360
|
This benchmark isolates the `MemoryFieldGraph`: related memories can spread
|
|
1078
1361
|
activation, newer conflicting memories inhibit stale facts, graph energy decays,
|
|
1079
|
-
and active clusters can surface concept
|
|
1362
|
+
and active clusters can surface and persist concept memories.
|
|
1080
1363
|
Full machine-readable result: `benchmarks/field_memory_dynamics_results.json`.
|
|
1081
1364
|
|
|
1082
|
-
| engine | precision@1 | precision@3 | stale suppression | concept formation | decay ratio | avg latency |
|
|
1083
|
-
|
|
1084
|
-
| WaveMind graph | 1.00 | 1.00 | 1.00 | 1.00 | 0.81 |
|
|
1085
|
-
| WaveMind static | 0.20 | 1.00 | 0.20 | 0.00 | 0.00 | 0.
|
|
1365
|
+
| engine | precision@1 | precision@3 | stale suppression | concept formation | concept consolidation | decay ratio | avg latency |
|
|
1366
|
+
|---|---:|---:|---:|---:|---:|---:|---:|
|
|
1367
|
+
| WaveMind graph | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.81 | 1.81 ms |
|
|
1368
|
+
| WaveMind static | 0.20 | 1.00 | 0.20 | 0.00 | 0.00 | 0.00 | 0.48 ms |
|
|
1086
1369
|
|
|
1087
1370
|
Run locally from a cloned repository:
|
|
1088
1371
|
|
|
@@ -1223,6 +1506,7 @@ If you already use Chroma for local memory, see the practical migration guide:
|
|
|
1223
1506
|
- Optimal capacity on the current NumPy exact index is up to 1000 records.
|
|
1224
1507
|
- At 5000 records, one-word `precision@1` is currently 0.72 with the hash encoder; many misses are ambiguous queries where another sentence containing the same word ranks first.
|
|
1225
1508
|
- For `N > 5000`, the NumPy exact index is still reliable but scales linearly. Annoy is faster at 50000 vectors in the local curve, but current recall is only `0.730`; the `quantized` backend reaches `0.934` recall@10 but is slower than NumPy on the current kernel. Use FAISS or a production vector service before claiming large-scale ANN quality.
|
|
1509
|
+
- Run `wavemind scale-plan --target-memories <N>` before growing a deployment. It is a guardrail, not a benchmark replacement: it tells you when NumPy is no longer the right candidate index and which checks to run next.
|
|
1226
1510
|
- `sentence-transformers/paraphrase-multilingual-mpnet-base-v2` requires about 420 MB of model files. Benchmark runners cache embeddings so retrieval latency is measured separately from model encoding latency.
|
|
1227
1511
|
- The Chroma comparison currently uses shared precomputed hash embeddings to isolate retrieval/ranking behavior; semantic model comparisons should be run separately.
|
|
1228
1512
|
- The BEIR SciFact run uses the hash encoder to isolate index/retrieval behavior. It is not a semantic embedding leaderboard result.
|
|
@@ -1243,10 +1527,10 @@ If you already use Chroma for local memory, see the practical migration guide:
|
|
|
1243
1527
|
- The `quantized` backend is an explicit int8 candidate-index experiment. It
|
|
1244
1528
|
reduces vector precision and must be benchmarked per workload before use.
|
|
1245
1529
|
- The synthetic long-term memory evidence benchmark is useful for regression and product-shape proof, but public claims should lean on LoCoMo and LongMemEval instead.
|
|
1246
|
-
- The LongMemEval result is retrieval-only.
|
|
1530
|
+
- The main LongMemEval evidence result is retrieval-only. The checked-in Ollama answer-generation comparison now includes WaveMind, Chroma static, and Qdrant static over 50 questions, but it is still not a full LongMemEval leaderboard-equivalent score.
|
|
1247
1531
|
- Qdrant baselines in this README use embedded local mode. Qdrant itself warns that local mode is not recommended above 20000 points; use the `qdrant-service` benchmark profile before making production latency claims.
|
|
1248
1532
|
- MTEB, MIRACL, LMEB, official VectorDBBench, and RAGBench are listed as the public benchmark roadmap, not as completed results yet.
|
|
1249
|
-
- Ollama answer generation
|
|
1533
|
+
- Local Ollama answer generation now works with `qwen2.5:0.5b` and `qwen2.5:1.5b`; WaveMind leads the checked-in Chroma/Qdrant smoke comparison, but answer quality is still limited by small-model reasoning and should be rerun with stronger local/API models before making product claims.
|
|
1250
1534
|
- Public benchmark adapters require optional datasets, heavier dependencies, or running services. They are intentionally outside the minimal `pip install wavemind` path.
|
|
1251
1535
|
- Dynamic memory is slower than static Chroma in the current local benchmark: 25.26 ms vs 1.75 ms average query latency on this machine.
|
|
1252
1536
|
- Current WaveMind-only dynamic checks keep `precision@1` at 1.00 through 5000 memories, but average latency is around 48-54 ms. The next optimization target is field/re-ranking latency, not basic recall quality.
|