wavemind 2.2.2__tar.gz → 2.2.4__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {wavemind-2.2.2 → wavemind-2.2.4}/PKG-INFO +82 -30
- {wavemind-2.2.2 → wavemind-2.2.4}/README.md +78 -29
- {wavemind-2.2.2 → wavemind-2.2.4}/benchmarks/BENCHMARK_LEADERBOARD.md +4 -3
- {wavemind-2.2.2 → wavemind-2.2.4}/benchmarks/BENCHMARK_REPORT.md +4 -3
- {wavemind-2.2.2 → wavemind-2.2.4}/benchmarks/ann_index_curve_benchmark.py +79 -16
- {wavemind-2.2.2 → wavemind-2.2.4}/benchmarks/benchmark_matrix_results.json +79 -16
- {wavemind-2.2.2 → wavemind-2.2.4}/benchmarks/benchmark_registry.py +53 -10
- {wavemind-2.2.2 → wavemind-2.2.4}/benchmarks/production_load_benchmark.py +6 -3
- wavemind-2.2.4/benchmarks/production_load_qdrant_100k_tuned_results.json +75 -0
- wavemind-2.2.4/benchmarks/production_load_qdrant_1m_ef_sweep_results.json +79 -0
- wavemind-2.2.4/benchmarks/production_load_qdrant_1m_tuned_results.json +75 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/benchmarks/scale_readiness_benchmark.py +84 -0
- wavemind-2.2.4/benchmarks/scale_readiness_results.json +64 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/docker-compose.yml +1 -1
- {wavemind-2.2.2 → wavemind-2.2.4}/docs/BENCHMARK_BRIEF.md +27 -17
- {wavemind-2.2.2 → wavemind-2.2.4}/docs/LAUNCH_KIT.md +13 -5
- {wavemind-2.2.2 → wavemind-2.2.4}/docs/ROADMAP.md +6 -5
- {wavemind-2.2.2 → wavemind-2.2.4}/pyproject.toml +5 -1
- {wavemind-2.2.2 → wavemind-2.2.4}/tests/test_cli_smoke.py +27 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/tests/test_cluster.py +25 -0
- wavemind-2.2.4/tests/test_jobs.py +126 -0
- wavemind-2.2.4/tests/test_replication.py +106 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/tests/test_scale_readiness_benchmark.py +4 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/wavemind/__init__.py +17 -2
- {wavemind-2.2.2 → wavemind-2.2.4}/wavemind/cli.py +22 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/wavemind/cluster.py +76 -2
- {wavemind-2.2.2 → wavemind-2.2.4}/wavemind/jobs.py +158 -1
- wavemind-2.2.4/wavemind/replication.py +399 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/wavemind.egg-info/PKG-INFO +82 -30
- {wavemind-2.2.2 → wavemind-2.2.4}/wavemind.egg-info/SOURCES.txt +5 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/wavemind.egg-info/requires.txt +4 -0
- wavemind-2.2.2/benchmarks/scale_readiness_results.json +0 -49
- wavemind-2.2.2/tests/test_jobs.py +0 -56
- {wavemind-2.2.2 → wavemind-2.2.4}/CONTRIBUTING.md +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/Dockerfile +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/LICENSE +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/MANIFEST.in +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/SECURITY.md +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/SUPPORT.md +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/benchmarks/agent_memory_benchmark.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/benchmarks/agent_memory_results.json +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/benchmarks/ann_index_curve_results.json +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/benchmarks/dynamic_memory_benchmark.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/benchmarks/dynamic_memory_results.json +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/benchmarks/field_memory_dynamics_benchmark.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/benchmarks/field_memory_dynamics_results.json +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/benchmarks/locomo_evidence_results.json +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/benchmarks/locomo_memory_benchmark.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/benchmarks/locomo_sentence_evidence_results.json +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/benchmarks/long_memory_evidence_benchmark.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/benchmarks/long_memory_evidence_results.json +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/benchmarks/longmemeval_answer_benchmark.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/benchmarks/longmemeval_answer_extractive_20_results.json +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/benchmarks/longmemeval_answer_qwen25_0_5b_50_results.json +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/benchmarks/longmemeval_answer_qwen25_1_5b_50_results.json +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/benchmarks/longmemeval_evidence_50_results.json +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/benchmarks/longmemeval_evidence_results.json +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/benchmarks/longmemeval_memory_benchmark.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/benchmarks/memory_competitor_benchmark.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/benchmarks/memory_competitor_results.json +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/benchmarks/nomiracl_russian_benchmark.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/benchmarks/nomiracl_russian_results.json +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/benchmarks/open_retrieval_benchmark.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/benchmarks/open_retrieval_scifact_results.json +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/benchmarks/production_index_profile_results.json +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/benchmarks/production_load_qdrant_1m_results.json +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/benchmarks/production_load_results.json +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/benchmarks/render_benchmark_charts.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/benchmarks/render_benchmark_leaderboard.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/benchmarks/render_benchmark_report.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/benchmarks/ru_sentences_benchmark.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/benchmarks/wavemind_capacity_results.json +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/docs/CHROMA_MIGRATION.md +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/docs/DEMO_SCRIPT.md +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/docs/OBSERVABILITY.md +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/docs/PROJECT_BOARD.md +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/docs/RELEASE.md +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/docs/RU_LAUNCH_POSTS.md +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/docs/USE_CASES.md +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/docs/assets/benchmark-summary.svg +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/docs/assets/wavemind-demo.gif +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/docs/assets/wavemind-social-card.svg +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/examples/agent_with_memory.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/examples/chroma_migration.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/examples/customer_support_memory.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/examples/demo.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/examples/dynamic_memory_demo.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/examples/framework_integrations.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/examples/langchain_memory.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/examples/llamaindex_retriever.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/examples/observability/README.md +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/examples/observability/docker-compose.yml +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/examples/observability/otel-collector.yaml +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/examples/observability/prometheus-alerts.yml +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/examples/observability/prometheus.yml +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/examples/production-index-profile/README.md +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/examples/production-index-profile/docker-compose.yml +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/examples/research_notebook_memory.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/examples/sharded_memory.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/install.bat +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/install.sh +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/requirements-optional.txt +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/requirements.txt +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/setup.cfg +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/tests/test_agent_memory_benchmark.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/tests/test_ann_index_curve_benchmark.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/tests/test_api.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/tests/test_api_process_persistence.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/tests/test_benchmark_brief.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/tests/test_benchmark_charts.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/tests/test_benchmark_leaderboard.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/tests/test_benchmark_registry.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/tests/test_benchmark_report.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/tests/test_chroma_migration_example.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/tests/test_core_persistence.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/tests/test_dynamic_memory_benchmark.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/tests/test_examples.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/tests/test_field_graph.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/tests/test_field_graph_integration.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/tests/test_field_memory_dynamics_benchmark.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/tests/test_framework_adapters.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/tests/test_import_benchmark.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/tests/test_indexes_encoders.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/tests/test_langchain_integration.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/tests/test_locomo_memory_benchmark.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/tests/test_long_memory_evidence_benchmark.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/tests/test_longmemeval_answer_benchmark.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/tests/test_longmemeval_memory_benchmark.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/tests/test_memory_competitor_benchmark.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/tests/test_multimodal.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/tests/test_nomiracl_russian_benchmark.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/tests/test_observability.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/tests/test_observability_docs.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/tests/test_open_retrieval_benchmark.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/tests/test_packaging_files.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/tests/test_postgres_storage.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/tests/test_production_index_profile.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/tests/test_production_load_benchmark.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/tests/test_scale_plan.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/tests/test_semantic_and_latency.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/tests/test_sharding.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/wavemind/__main__.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/wavemind/api.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/wavemind/benchmark.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/wavemind/core.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/wavemind/encoders.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/wavemind/field_graph.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/wavemind/importers.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/wavemind/indexes.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/wavemind/integrations/__init__.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/wavemind/integrations/autogen.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/wavemind/integrations/crewai.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/wavemind/integrations/langchain.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/wavemind/integrations/langgraph.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/wavemind/integrations/llamaindex.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/wavemind/multimodal.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/wavemind/observability.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/wavemind/scale.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/wavemind/sharding.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/wavemind/storage.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/wavemind/studio.py +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/wavemind.egg-info/dependency_links.txt +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/wavemind.egg-info/entry_points.txt +0 -0
- {wavemind-2.2.2 → wavemind-2.2.4}/wavemind.egg-info/top_level.txt +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: wavemind
|
|
3
|
-
Version: 2.2.
|
|
3
|
+
Version: 2.2.4
|
|
4
4
|
Summary: Local-first dynamic memory field with vector search and wave-field re-ranking
|
|
5
5
|
License-Expression: MIT
|
|
6
6
|
Project-URL: Homepage, https://github.com/CaspianG/wavemind
|
|
@@ -23,6 +23,8 @@ Requires-Dist: faiss-cpu>=1.8; platform_system != "Windows" and extra == "indexe
|
|
|
23
23
|
Requires-Dist: qdrant-client>=1.9; extra == "indexes"
|
|
24
24
|
Provides-Extra: postgres
|
|
25
25
|
Requires-Dist: psycopg[binary]>=3.1; extra == "postgres"
|
|
26
|
+
Provides-Extra: redis
|
|
27
|
+
Requires-Dist: redis>=5.0; extra == "redis"
|
|
26
28
|
Provides-Extra: otel
|
|
27
29
|
Requires-Dist: opentelemetry-api>=1.25; extra == "otel"
|
|
28
30
|
Requires-Dist: opentelemetry-sdk>=1.25; extra == "otel"
|
|
@@ -38,6 +40,7 @@ Requires-Dist: annoy>=1.17; extra == "production"
|
|
|
38
40
|
Requires-Dist: faiss-cpu>=1.8; platform_system != "Windows" and extra == "production"
|
|
39
41
|
Requires-Dist: qdrant-client>=1.9; extra == "production"
|
|
40
42
|
Requires-Dist: psycopg[binary]>=3.1; extra == "production"
|
|
43
|
+
Requires-Dist: redis>=5.0; extra == "production"
|
|
41
44
|
Requires-Dist: opentelemetry-api>=1.25; extra == "production"
|
|
42
45
|
Requires-Dist: opentelemetry-sdk>=1.25; extra == "production"
|
|
43
46
|
Requires-Dist: opentelemetry-exporter-otlp>=1.25; extra == "production"
|
|
@@ -538,14 +541,15 @@ Checked-in result:
|
|
|
538
541
|
|
|
539
542
|
| profile | result |
|
|
540
543
|
|---|---:|
|
|
541
|
-
| Cluster planner | 4096 namespaces, 4 nodes, replication factor 2,
|
|
544
|
+
| Cluster planner | 4096 namespaces, 4 nodes, replication factor 2, node-loss availability `1.000`, zone-loss availability `1.000`, write quorum `2`. |
|
|
542
545
|
| Hot cache | 2000 lookups, hit rate `0.920`, p99 lookup `0.01 ms`. |
|
|
543
|
-
|
|
|
546
|
+
| Replicated runtime | 3 physical WaveMind stores, replication factor 3, write quorum 2, node-loss recall `true`, repair copied `1` missing record, p99 query-after-loss `1.16 ms`. |
|
|
547
|
+
| Structured payloads | image/audio/table/event retrieval, precision@1 `1.000`, p99 `0.69 ms`. |
|
|
544
548
|
|
|
545
|
-
This profile validates routing,
|
|
546
|
-
It is not a 10M-vector load test.
|
|
547
|
-
|
|
548
|
-
hardware.
|
|
549
|
+
This profile validates routing, quorum-replicated runtime behavior, cache
|
|
550
|
+
behavior, and structured payload handling. It is not a 10M-vector load test.
|
|
551
|
+
Real 100k, 1M, and 10M latency claims should come from service-backed
|
|
552
|
+
FAISS/Qdrant/pgvector load tests on production-like hardware.
|
|
549
553
|
|
|
550
554
|
Cluster placement planning:
|
|
551
555
|
|
|
@@ -562,11 +566,30 @@ wavemind cluster-plan \
|
|
|
562
566
|
|
|
563
567
|
This uses deterministic rendezvous placement so each namespace has a primary
|
|
564
568
|
and replica set. The emitted Kubernetes StatefulSet manifest is a deployment
|
|
565
|
-
starting point
|
|
566
|
-
|
|
569
|
+
starting point. Runtime quorum replication is available through
|
|
570
|
+
`ReplicatedWaveMind`; consensus across independently managed network services
|
|
571
|
+
should still use a production database or service layer.
|
|
567
572
|
|
|
568
573
|
The same planner is available over HTTP as `POST /cluster-plan`.
|
|
569
574
|
|
|
575
|
+
Maintenance worker:
|
|
576
|
+
|
|
577
|
+
```sh
|
|
578
|
+
wavemind maintenance --namespace user:42 --consolidate-steps 10 --consolidate-concepts --json
|
|
579
|
+
```
|
|
580
|
+
|
|
581
|
+
This runs one deterministic maintenance pass: expired-memory purge, optional
|
|
582
|
+
field/concept consolidation, and index-health repair. Production deployments can
|
|
583
|
+
call the same command from cron, systemd, Kubernetes CronJobs, Celery, RQ, or
|
|
584
|
+
Temporal.
|
|
585
|
+
|
|
586
|
+
Hot-cache options:
|
|
587
|
+
|
|
588
|
+
| cache | use case |
|
|
589
|
+
|---|---|
|
|
590
|
+
| `HotMemoryCache` | in-process local API/server cache. |
|
|
591
|
+
| `RedisHotMemoryCache` | shared cache for multiple API workers. Install with `pip install "wavemind[redis]"`. |
|
|
592
|
+
|
|
570
593
|
## Structured And Multimodal Memory
|
|
571
594
|
|
|
572
595
|
WaveMind can store non-text memories as structured text plus metadata. This is
|
|
@@ -1017,8 +1040,8 @@ Current read:
|
|
|
1017
1040
|
| LongMemEval full retrieval | On the official LongMemEval-S cleaned file, 470 non-abstention session-level questions, WaveMind reaches `evidence_recall@5 0.782` and `precision@1 0.696`; Chroma static reaches `0.518` / `0.355`; Qdrant static reaches `0.520` / `0.355`. | This is now the strongest public memory result in the repo. It is retrieval-only, not final answer quality. |
|
|
1018
1041
|
| LongMemEval 50-query smoke | On the first 50 non-abstention LongMemEval-S questions, WaveMind reaches `evidence_recall@5 0.920`, `precision@1 0.760`, and `MRR@5 0.827`; Chroma/Qdrant static reach `0.600`, `0.260`, and `0.385`. | This is the fast regression profile for checking current changes before rerunning the full LongMemEval profile. WaveMind wins on quality; latency still needs work. |
|
|
1019
1042
|
| ANN/index curve | At 50000 generated 128-d vectors, NumPy exact keeps `recall@10 1.000` at `6.49 ms`; quantized int8 keeps `0.934` at `24.92 ms`; Annoy is faster at `4.92 ms` but drops to `0.730` recall; Qdrant local keeps `1.000` recall at `43.49 ms`. | Current local scale boundary is clear: quantized search needs kernel work, Annoy needs tuning/FAISS, and Qdrant should be tested in service mode for a fair production comparison. |
|
|
1020
|
-
| Production load | At 100000 generated 128-d vectors, service-mode Qdrant reaches `recall@10 1.000`, avg `10.
|
|
1021
|
-
| Scale readiness | Deterministic 1M-memory simulation validates 4096 namespace placements over 4 nodes with replication factor 2,
|
|
1043
|
+
| Production load | At 100000 generated 128-d vectors, service-mode Qdrant reaches `recall@10 1.000`, avg `10.28 ms`, p99 `21.26 ms`. At 1M, tuned Qdrant reaches `recall@10 0.984`, avg `116.80 ms`, p99 `209.28 ms`; an EF sweep finds `recall@10 0.977`, avg `64.76 ms`, p99 `103.77 ms` at `hnsw_ef=2048` on 30 queries. | 100k is production-grade on the tested machine. 1M recall is now strong, but p99 still needs tuning before claiming a stable sub-100 ms SLO. |
|
|
1044
|
+
| Scale readiness | Deterministic 1M-memory simulation validates 4096 namespace placements over 4 nodes with replication factor 2, node-loss availability `1.000`, zone-loss availability `1.000`, hot-cache hit rate `0.920`, quorum-replicated runtime recall after node loss, replica repair, and structured payload precision@1 `1.000`. | This proves routing, cache, payload, and replicated-runtime foundations. It is not a 10M-vector latency claim; real 10M latency still needs service-backed load tests on larger hardware. |
|
|
1022
1045
|
| Memory competitor adapters | WaveMind reaches `precision@1 0.80`, `precision@3 1.00`, stale suppression `1.00` on the small adapter profile. Mem0, Zep, and LangGraph are listed as skipped unless their real packages/services are configured. | This prevents fake competitor claims. The adapter harness is ready; real Mem0/Zep/LangGraph results still need configured installs. |
|
|
1023
1046
|
| LongMemEval local answer generation | With the same local Ollama `qwen2.5:1.5b`, WaveMind reaches `exact_match 0.240`, `contains_answer 0.380`, `token_f1 0.333`, and `evidence_recall@5 0.920`; Chroma and Qdrant static both reach `0.120`, `0.160`, `0.170`, and `0.600`. | This is the first checked-in end-to-end answer benchmark against Chroma/Qdrant. It is still a 50-question lightweight smoke run, not a full LongMemEval leaderboard score. |
|
|
1024
1047
|
|
|
@@ -1035,8 +1058,9 @@ Current read:
|
|
|
1035
1058
|
| NoMIRACL Russian retrieval | Russian human-annotated multilingual relevance over compact candidate passages. | implemented | WaveMind / Chroma / Qdrant | Keep same-embedding `nDCG@10` at parity, then rerun with sentence-transformers and full MIRACL Russian when disk/service capacity allows it. |
|
|
1036
1059
|
| ANN/VectorDBBench-style local curve | Recall/latency tradeoff for candidate indexes on generated vectors. | implemented | NumPy exact / quantized int8 / Annoy / Qdrant local | Use this as the local engineering curve; official VectorDBBench remains future work. |
|
|
1037
1060
|
| Production index profile | Docker-backed 50000-vector profile for persisted FAISS, Qdrant service, and PostgreSQL/pgvector HNSW. | implemented | FAISS / Qdrant service / pgvector | Keep service-mode candidate generation above `0.95` recall@10 and below 10 ms average query latency at 50000 vectors. |
|
|
1038
|
-
| Production load profile | 100k and 1M service-backed candidate-index checks. | implemented | Qdrant service / pgvector HNSW / FAISS persisted | 100k
|
|
1039
|
-
|
|
|
1061
|
+
| Production load profile | 100k and 1M service-backed candidate-index checks with p95/p99 latency. | implemented | Qdrant service / pgvector HNSW / FAISS persisted | Keep 100k at recall@10 `1.000`; push 1M p99 below 100 ms with recall@10 >= 0.95. |
|
|
1062
|
+
| Qdrant 1M HNSW ef sweep | One 1M Qdrant collection queried with multiple `hnsw_ef` values. | implemented | Qdrant service | Repeat with 100+ queries and collection-level HNSW build parameters before claiming a stable 1M SLO. |
|
|
1063
|
+
| Scale readiness profile | Cluster placement, node/zone-loss simulation, quorum report, replicated runtime, hot-cache behavior, and structured/multimodal payload retrieval. | implemented | Mem0 / Zep / LangGraph persistent memory / GraphRAG target adapters | Keep quorum replication and repair green while adding larger service-backed 10M load tests. |
|
|
1040
1064
|
| Memory competitor adapter profile | Dynamic-memory scenario wired for external memory frameworks. | implemented | Mem0 / Zep / LangGraph persistent memory | Report real competitor results only when their packages/services are explicitly configured. |
|
|
1041
1065
|
| [BEIR](https://github.com/beir-cellar/beir) | Standard zero-shot information retrieval quality. | planned | Chroma / Qdrant / FAISS | Stay within 0.02 `nDCG@10` on identical embeddings. |
|
|
1042
1066
|
| [MTEB Retrieval](https://github.com/embeddings-benchmark/mteb) | Separates encoder quality from retrieval-store quality. | planned | Chroma / Qdrant / FAISS | Prove WaveMind does not reduce same-embedding retrieval quality. |
|
|
@@ -1148,7 +1172,7 @@ If Chroma or Qdrant are not installed, use the baseline-only command:
|
|
|
1148
1172
|
python benchmarks/locomo_memory_benchmark.py --dataset benchmarks/data/locomo10.json --engines wavemind static --top-k 5
|
|
1149
1173
|
```
|
|
1150
1174
|
|
|
1151
|
-
## Namespace Sharding
|
|
1175
|
+
## Namespace Sharding And Replication
|
|
1152
1176
|
|
|
1153
1177
|
For multi-tenant local deployments, `ShardedWaveMind` routes namespaces across
|
|
1154
1178
|
multiple SQLite files:
|
|
@@ -1165,9 +1189,35 @@ print(memory.stats())
|
|
|
1165
1189
|
memory.close()
|
|
1166
1190
|
```
|
|
1167
1191
|
|
|
1168
|
-
|
|
1169
|
-
|
|
1170
|
-
|
|
1192
|
+
For HA-style local or service-mode deployments, `ReplicatedWaveMind` writes each
|
|
1193
|
+
namespace to a deterministic replica set and enforces read/write quorum:
|
|
1194
|
+
|
|
1195
|
+
```python
|
|
1196
|
+
from wavemind import ReplicatedWaveMind
|
|
1197
|
+
|
|
1198
|
+
memory = ReplicatedWaveMind(
|
|
1199
|
+
root_path="./state/wavemind-replicas",
|
|
1200
|
+
nodes=[
|
|
1201
|
+
{"id": "node-a", "address": "10.0.0.1:8000", "zone": "zone-a"},
|
|
1202
|
+
{"id": "node-b", "address": "10.0.0.2:8000", "zone": "zone-b"},
|
|
1203
|
+
{"id": "node-c", "address": "10.0.0.3:8000", "zone": "zone-c"},
|
|
1204
|
+
],
|
|
1205
|
+
replication_factor=3,
|
|
1206
|
+
)
|
|
1207
|
+
|
|
1208
|
+
memory.remember("Tenant A prefers short support replies.", namespace="tenant:a")
|
|
1209
|
+
print(memory.query("support replies", namespace="tenant:a", top_k=3))
|
|
1210
|
+
|
|
1211
|
+
memory.set_node_available("node-a", False)
|
|
1212
|
+
print(memory.query("support replies", namespace="tenant:a", top_k=3))
|
|
1213
|
+
memory.close()
|
|
1214
|
+
```
|
|
1215
|
+
|
|
1216
|
+
The runtime uses separate durable stores per node, quorum writes, quorum reads,
|
|
1217
|
+
merged replica results, and `repair_namespace()` for recovered replicas. It is
|
|
1218
|
+
the production foundation for namespace-level HA; for full consensus across
|
|
1219
|
+
independent network services, deploy WaveMind with Postgres/Qdrant/ops-layer
|
|
1220
|
+
replication.
|
|
1171
1221
|
|
|
1172
1222
|
Checked-in official LoCoMo retrieval result:
|
|
1173
1223
|
|
|
@@ -1307,15 +1357,16 @@ Checked-in production load points:
|
|
|
1307
1357
|
|
|
1308
1358
|
```sh
|
|
1309
1359
|
python benchmarks/production_load_benchmark.py --sizes 100000 --dim 128 --queries 100 --top-k 10 --engines qdrant-service pgvector faiss-persisted
|
|
1310
|
-
python benchmarks/production_load_benchmark.py --sizes 1000000 --dim 128 --queries 50 --top-k 10 --engines qdrant-service --output benchmarks/
|
|
1360
|
+
python benchmarks/production_load_benchmark.py --sizes 1000000 --dim 128 --queries 50 --top-k 10 --engines qdrant-service --output benchmarks/production_load_qdrant_1m_tuned_results.json
|
|
1311
1361
|
```
|
|
1312
1362
|
|
|
1313
|
-
| vectors | engine | recall@10 | avg latency | p95 latency | build |
|
|
1314
|
-
|
|
1315
|
-
| 100000 | Qdrant service | 1.000 | 10.
|
|
1316
|
-
| 100000 | WaveMind pgvector | 0.736 | 17.76 ms | 23.48 ms | 455703.7 ms |
|
|
1317
|
-
| 100000 | WaveMind faiss-persisted | skipped | - | - | - |
|
|
1318
|
-
| 1000000 | Qdrant service | 0.
|
|
1363
|
+
| vectors | engine | recall@10 | avg latency | p95 latency | p99 latency | build |
|
|
1364
|
+
|---:|---|---:|---:|---:|---:|---:|
|
|
1365
|
+
| 100000 | Qdrant service | 1.000 | 10.28 ms | 18.97 ms | 21.26 ms | 27439.3 ms |
|
|
1366
|
+
| 100000 | WaveMind pgvector | 0.736 | 17.76 ms | 23.48 ms | - | 455703.7 ms |
|
|
1367
|
+
| 100000 | WaveMind faiss-persisted | skipped | - | - | - | - |
|
|
1368
|
+
| 1000000 | Qdrant service tuned | 0.984 | 116.80 ms | 153.84 ms | 209.28 ms | 450674.6 ms |
|
|
1369
|
+
| 1000000 | Qdrant `hnsw_ef=2048` sweep point | 0.977 | 64.76 ms | 91.18 ms | 103.77 ms | 451912.4 ms |
|
|
1319
1370
|
|
|
1320
1371
|
Read this as an engineering curve, not an official VectorDBBench result. Annoy
|
|
1321
1372
|
is faster than exact NumPy at 50000 vectors but loses too much recall with the
|
|
@@ -1327,9 +1378,9 @@ FAISS persistence and service-mode Qdrant now both preserve exact recall at
|
|
|
1327
1378
|
`WAVEMIND_PGVECTOR_EF_SEARCH=400`, which improves recall materially but still
|
|
1328
1379
|
misses the `0.95` production target and is slower than the other two profiles.
|
|
1329
1380
|
The 100k load profile shows Qdrant service is already viable for candidate
|
|
1330
|
-
generation
|
|
1331
|
-
|
|
1332
|
-
|
|
1381
|
+
generation on the tested machine. The tuned 1M profile crosses the recall target,
|
|
1382
|
+
and the EF sweep gets close to the p99 latency target, but 1M should still be
|
|
1383
|
+
treated as tuning-in-progress until a 100+ query p99 run stays below 100 ms.
|
|
1333
1384
|
If a required package, service, or environment variable is missing, the runner
|
|
1334
1385
|
marks that engine as `skipped` instead of silently falling back to another
|
|
1335
1386
|
backend.
|
|
@@ -1494,7 +1545,7 @@ python benchmarks/dynamic_memory_benchmark.py --engines wavemind chroma --memori
|
|
|
1494
1545
|
| Dynamic memory priority | Wave-field hotness, TTL, priority | Metadata/filter driven | Payload/filter driven |
|
|
1495
1546
|
| Built-in forgetting | TTL and explicit forget | Manual delete/filtering | Manual delete/filtering |
|
|
1496
1547
|
| Best fit | Small to medium memory streams with dynamic recall | Local RAG apps and prototypes | Large-scale vector search |
|
|
1497
|
-
| Scale target today |
|
|
1548
|
+
| Scale target today | Local exact mode for small streams; FAISS/Qdrant/pgvector plus replicated namespaces for production paths | Larger than WaveMind local exact mode | Production vector scale |
|
|
1498
1549
|
|
|
1499
1550
|
WaveMind is not trying to replace dedicated vector databases at scale. The intended product gap is dynamic priority: frequently used memories can become hotter while old or low-priority memories fade. For static RAG over large document collections, use a mature vector database. For memory that needs persistence, scoped recall, TTL, forgetting, and reinforcement, WaveMind is designed to sit above or beside the vector index.
|
|
1500
1551
|
|
|
@@ -1522,8 +1573,9 @@ If you already use Chroma for local memory, see the practical migration guide:
|
|
|
1522
1573
|
from SQLite on load/build, so large service-mode deployments still need a
|
|
1523
1574
|
measured rebuild strategy and index-health monitoring.
|
|
1524
1575
|
- The persisted FAISS backend validates a snapshot against current memory ids
|
|
1525
|
-
and avoids unnecessary FAISS rebuilds when the snapshot matches.
|
|
1526
|
-
a single-node flat-index path
|
|
1576
|
+
and avoids unnecessary FAISS rebuilds when the snapshot matches. FAISS itself
|
|
1577
|
+
is a single-node flat-index path; use `ReplicatedWaveMind` or external
|
|
1578
|
+
database/service replication when that is not enough.
|
|
1527
1579
|
- The `quantized` backend is an explicit int8 candidate-index experiment. It
|
|
1528
1580
|
reduces vector precision and must be benchmarked per workload before use.
|
|
1529
1581
|
- The synthetic long-term memory evidence benchmark is useful for regression and product-shape proof, but public claims should lean on LoCoMo and LongMemEval instead.
|
|
@@ -488,14 +488,15 @@ Checked-in result:
|
|
|
488
488
|
|
|
489
489
|
| profile | result |
|
|
490
490
|
|---|---:|
|
|
491
|
-
| Cluster planner | 4096 namespaces, 4 nodes, replication factor 2,
|
|
491
|
+
| Cluster planner | 4096 namespaces, 4 nodes, replication factor 2, node-loss availability `1.000`, zone-loss availability `1.000`, write quorum `2`. |
|
|
492
492
|
| Hot cache | 2000 lookups, hit rate `0.920`, p99 lookup `0.01 ms`. |
|
|
493
|
-
|
|
|
493
|
+
| Replicated runtime | 3 physical WaveMind stores, replication factor 3, write quorum 2, node-loss recall `true`, repair copied `1` missing record, p99 query-after-loss `1.16 ms`. |
|
|
494
|
+
| Structured payloads | image/audio/table/event retrieval, precision@1 `1.000`, p99 `0.69 ms`. |
|
|
494
495
|
|
|
495
|
-
This profile validates routing,
|
|
496
|
-
It is not a 10M-vector load test.
|
|
497
|
-
|
|
498
|
-
hardware.
|
|
496
|
+
This profile validates routing, quorum-replicated runtime behavior, cache
|
|
497
|
+
behavior, and structured payload handling. It is not a 10M-vector load test.
|
|
498
|
+
Real 100k, 1M, and 10M latency claims should come from service-backed
|
|
499
|
+
FAISS/Qdrant/pgvector load tests on production-like hardware.
|
|
499
500
|
|
|
500
501
|
Cluster placement planning:
|
|
501
502
|
|
|
@@ -512,11 +513,30 @@ wavemind cluster-plan \
|
|
|
512
513
|
|
|
513
514
|
This uses deterministic rendezvous placement so each namespace has a primary
|
|
514
515
|
and replica set. The emitted Kubernetes StatefulSet manifest is a deployment
|
|
515
|
-
starting point
|
|
516
|
-
|
|
516
|
+
starting point. Runtime quorum replication is available through
|
|
517
|
+
`ReplicatedWaveMind`; consensus across independently managed network services
|
|
518
|
+
should still use a production database or service layer.
|
|
517
519
|
|
|
518
520
|
The same planner is available over HTTP as `POST /cluster-plan`.
|
|
519
521
|
|
|
522
|
+
Maintenance worker:
|
|
523
|
+
|
|
524
|
+
```sh
|
|
525
|
+
wavemind maintenance --namespace user:42 --consolidate-steps 10 --consolidate-concepts --json
|
|
526
|
+
```
|
|
527
|
+
|
|
528
|
+
This runs one deterministic maintenance pass: expired-memory purge, optional
|
|
529
|
+
field/concept consolidation, and index-health repair. Production deployments can
|
|
530
|
+
call the same command from cron, systemd, Kubernetes CronJobs, Celery, RQ, or
|
|
531
|
+
Temporal.
|
|
532
|
+
|
|
533
|
+
Hot-cache options:
|
|
534
|
+
|
|
535
|
+
| cache | use case |
|
|
536
|
+
|---|---|
|
|
537
|
+
| `HotMemoryCache` | in-process local API/server cache. |
|
|
538
|
+
| `RedisHotMemoryCache` | shared cache for multiple API workers. Install with `pip install "wavemind[redis]"`. |
|
|
539
|
+
|
|
520
540
|
## Structured And Multimodal Memory
|
|
521
541
|
|
|
522
542
|
WaveMind can store non-text memories as structured text plus metadata. This is
|
|
@@ -967,8 +987,8 @@ Current read:
|
|
|
967
987
|
| LongMemEval full retrieval | On the official LongMemEval-S cleaned file, 470 non-abstention session-level questions, WaveMind reaches `evidence_recall@5 0.782` and `precision@1 0.696`; Chroma static reaches `0.518` / `0.355`; Qdrant static reaches `0.520` / `0.355`. | This is now the strongest public memory result in the repo. It is retrieval-only, not final answer quality. |
|
|
968
988
|
| LongMemEval 50-query smoke | On the first 50 non-abstention LongMemEval-S questions, WaveMind reaches `evidence_recall@5 0.920`, `precision@1 0.760`, and `MRR@5 0.827`; Chroma/Qdrant static reach `0.600`, `0.260`, and `0.385`. | This is the fast regression profile for checking current changes before rerunning the full LongMemEval profile. WaveMind wins on quality; latency still needs work. |
|
|
969
989
|
| ANN/index curve | At 50000 generated 128-d vectors, NumPy exact keeps `recall@10 1.000` at `6.49 ms`; quantized int8 keeps `0.934` at `24.92 ms`; Annoy is faster at `4.92 ms` but drops to `0.730` recall; Qdrant local keeps `1.000` recall at `43.49 ms`. | Current local scale boundary is clear: quantized search needs kernel work, Annoy needs tuning/FAISS, and Qdrant should be tested in service mode for a fair production comparison. |
|
|
970
|
-
| Production load | At 100000 generated 128-d vectors, service-mode Qdrant reaches `recall@10 1.000`, avg `10.
|
|
971
|
-
| Scale readiness | Deterministic 1M-memory simulation validates 4096 namespace placements over 4 nodes with replication factor 2,
|
|
990
|
+
| Production load | At 100000 generated 128-d vectors, service-mode Qdrant reaches `recall@10 1.000`, avg `10.28 ms`, p99 `21.26 ms`. At 1M, tuned Qdrant reaches `recall@10 0.984`, avg `116.80 ms`, p99 `209.28 ms`; an EF sweep finds `recall@10 0.977`, avg `64.76 ms`, p99 `103.77 ms` at `hnsw_ef=2048` on 30 queries. | 100k is production-grade on the tested machine. 1M recall is now strong, but p99 still needs tuning before claiming a stable sub-100 ms SLO. |
|
|
991
|
+
| Scale readiness | Deterministic 1M-memory simulation validates 4096 namespace placements over 4 nodes with replication factor 2, node-loss availability `1.000`, zone-loss availability `1.000`, hot-cache hit rate `0.920`, quorum-replicated runtime recall after node loss, replica repair, and structured payload precision@1 `1.000`. | This proves routing, cache, payload, and replicated-runtime foundations. It is not a 10M-vector latency claim; real 10M latency still needs service-backed load tests on larger hardware. |
|
|
972
992
|
| Memory competitor adapters | WaveMind reaches `precision@1 0.80`, `precision@3 1.00`, stale suppression `1.00` on the small adapter profile. Mem0, Zep, and LangGraph are listed as skipped unless their real packages/services are configured. | This prevents fake competitor claims. The adapter harness is ready; real Mem0/Zep/LangGraph results still need configured installs. |
|
|
973
993
|
| LongMemEval local answer generation | With the same local Ollama `qwen2.5:1.5b`, WaveMind reaches `exact_match 0.240`, `contains_answer 0.380`, `token_f1 0.333`, and `evidence_recall@5 0.920`; Chroma and Qdrant static both reach `0.120`, `0.160`, `0.170`, and `0.600`. | This is the first checked-in end-to-end answer benchmark against Chroma/Qdrant. It is still a 50-question lightweight smoke run, not a full LongMemEval leaderboard score. |
|
|
974
994
|
|
|
@@ -985,8 +1005,9 @@ Current read:
|
|
|
985
1005
|
| NoMIRACL Russian retrieval | Russian human-annotated multilingual relevance over compact candidate passages. | implemented | WaveMind / Chroma / Qdrant | Keep same-embedding `nDCG@10` at parity, then rerun with sentence-transformers and full MIRACL Russian when disk/service capacity allows it. |
|
|
986
1006
|
| ANN/VectorDBBench-style local curve | Recall/latency tradeoff for candidate indexes on generated vectors. | implemented | NumPy exact / quantized int8 / Annoy / Qdrant local | Use this as the local engineering curve; official VectorDBBench remains future work. |
|
|
987
1007
|
| Production index profile | Docker-backed 50000-vector profile for persisted FAISS, Qdrant service, and PostgreSQL/pgvector HNSW. | implemented | FAISS / Qdrant service / pgvector | Keep service-mode candidate generation above `0.95` recall@10 and below 10 ms average query latency at 50000 vectors. |
|
|
988
|
-
| Production load profile | 100k and 1M service-backed candidate-index checks. | implemented | Qdrant service / pgvector HNSW / FAISS persisted | 100k
|
|
989
|
-
|
|
|
1008
|
+
| Production load profile | 100k and 1M service-backed candidate-index checks with p95/p99 latency. | implemented | Qdrant service / pgvector HNSW / FAISS persisted | Keep 100k at recall@10 `1.000`; push 1M p99 below 100 ms with recall@10 >= 0.95. |
|
|
1009
|
+
| Qdrant 1M HNSW ef sweep | One 1M Qdrant collection queried with multiple `hnsw_ef` values. | implemented | Qdrant service | Repeat with 100+ queries and collection-level HNSW build parameters before claiming a stable 1M SLO. |
|
|
1010
|
+
| Scale readiness profile | Cluster placement, node/zone-loss simulation, quorum report, replicated runtime, hot-cache behavior, and structured/multimodal payload retrieval. | implemented | Mem0 / Zep / LangGraph persistent memory / GraphRAG target adapters | Keep quorum replication and repair green while adding larger service-backed 10M load tests. |
|
|
990
1011
|
| Memory competitor adapter profile | Dynamic-memory scenario wired for external memory frameworks. | implemented | Mem0 / Zep / LangGraph persistent memory | Report real competitor results only when their packages/services are explicitly configured. |
|
|
991
1012
|
| [BEIR](https://github.com/beir-cellar/beir) | Standard zero-shot information retrieval quality. | planned | Chroma / Qdrant / FAISS | Stay within 0.02 `nDCG@10` on identical embeddings. |
|
|
992
1013
|
| [MTEB Retrieval](https://github.com/embeddings-benchmark/mteb) | Separates encoder quality from retrieval-store quality. | planned | Chroma / Qdrant / FAISS | Prove WaveMind does not reduce same-embedding retrieval quality. |
|
|
@@ -1098,7 +1119,7 @@ If Chroma or Qdrant are not installed, use the baseline-only command:
|
|
|
1098
1119
|
python benchmarks/locomo_memory_benchmark.py --dataset benchmarks/data/locomo10.json --engines wavemind static --top-k 5
|
|
1099
1120
|
```
|
|
1100
1121
|
|
|
1101
|
-
## Namespace Sharding
|
|
1122
|
+
## Namespace Sharding And Replication
|
|
1102
1123
|
|
|
1103
1124
|
For multi-tenant local deployments, `ShardedWaveMind` routes namespaces across
|
|
1104
1125
|
multiple SQLite files:
|
|
@@ -1115,9 +1136,35 @@ print(memory.stats())
|
|
|
1115
1136
|
memory.close()
|
|
1116
1137
|
```
|
|
1117
1138
|
|
|
1118
|
-
|
|
1119
|
-
|
|
1120
|
-
|
|
1139
|
+
For HA-style local or service-mode deployments, `ReplicatedWaveMind` writes each
|
|
1140
|
+
namespace to a deterministic replica set and enforces read/write quorum:
|
|
1141
|
+
|
|
1142
|
+
```python
|
|
1143
|
+
from wavemind import ReplicatedWaveMind
|
|
1144
|
+
|
|
1145
|
+
memory = ReplicatedWaveMind(
|
|
1146
|
+
root_path="./state/wavemind-replicas",
|
|
1147
|
+
nodes=[
|
|
1148
|
+
{"id": "node-a", "address": "10.0.0.1:8000", "zone": "zone-a"},
|
|
1149
|
+
{"id": "node-b", "address": "10.0.0.2:8000", "zone": "zone-b"},
|
|
1150
|
+
{"id": "node-c", "address": "10.0.0.3:8000", "zone": "zone-c"},
|
|
1151
|
+
],
|
|
1152
|
+
replication_factor=3,
|
|
1153
|
+
)
|
|
1154
|
+
|
|
1155
|
+
memory.remember("Tenant A prefers short support replies.", namespace="tenant:a")
|
|
1156
|
+
print(memory.query("support replies", namespace="tenant:a", top_k=3))
|
|
1157
|
+
|
|
1158
|
+
memory.set_node_available("node-a", False)
|
|
1159
|
+
print(memory.query("support replies", namespace="tenant:a", top_k=3))
|
|
1160
|
+
memory.close()
|
|
1161
|
+
```
|
|
1162
|
+
|
|
1163
|
+
The runtime uses separate durable stores per node, quorum writes, quorum reads,
|
|
1164
|
+
merged replica results, and `repair_namespace()` for recovered replicas. It is
|
|
1165
|
+
the production foundation for namespace-level HA; for full consensus across
|
|
1166
|
+
independent network services, deploy WaveMind with Postgres/Qdrant/ops-layer
|
|
1167
|
+
replication.
|
|
1121
1168
|
|
|
1122
1169
|
Checked-in official LoCoMo retrieval result:
|
|
1123
1170
|
|
|
@@ -1257,15 +1304,16 @@ Checked-in production load points:
|
|
|
1257
1304
|
|
|
1258
1305
|
```sh
|
|
1259
1306
|
python benchmarks/production_load_benchmark.py --sizes 100000 --dim 128 --queries 100 --top-k 10 --engines qdrant-service pgvector faiss-persisted
|
|
1260
|
-
python benchmarks/production_load_benchmark.py --sizes 1000000 --dim 128 --queries 50 --top-k 10 --engines qdrant-service --output benchmarks/
|
|
1307
|
+
python benchmarks/production_load_benchmark.py --sizes 1000000 --dim 128 --queries 50 --top-k 10 --engines qdrant-service --output benchmarks/production_load_qdrant_1m_tuned_results.json
|
|
1261
1308
|
```
|
|
1262
1309
|
|
|
1263
|
-
| vectors | engine | recall@10 | avg latency | p95 latency | build |
|
|
1264
|
-
|
|
1265
|
-
| 100000 | Qdrant service | 1.000 | 10.
|
|
1266
|
-
| 100000 | WaveMind pgvector | 0.736 | 17.76 ms | 23.48 ms | 455703.7 ms |
|
|
1267
|
-
| 100000 | WaveMind faiss-persisted | skipped | - | - | - |
|
|
1268
|
-
| 1000000 | Qdrant service | 0.
|
|
1310
|
+
| vectors | engine | recall@10 | avg latency | p95 latency | p99 latency | build |
|
|
1311
|
+
|---:|---|---:|---:|---:|---:|---:|
|
|
1312
|
+
| 100000 | Qdrant service | 1.000 | 10.28 ms | 18.97 ms | 21.26 ms | 27439.3 ms |
|
|
1313
|
+
| 100000 | WaveMind pgvector | 0.736 | 17.76 ms | 23.48 ms | - | 455703.7 ms |
|
|
1314
|
+
| 100000 | WaveMind faiss-persisted | skipped | - | - | - | - |
|
|
1315
|
+
| 1000000 | Qdrant service tuned | 0.984 | 116.80 ms | 153.84 ms | 209.28 ms | 450674.6 ms |
|
|
1316
|
+
| 1000000 | Qdrant `hnsw_ef=2048` sweep point | 0.977 | 64.76 ms | 91.18 ms | 103.77 ms | 451912.4 ms |
|
|
1269
1317
|
|
|
1270
1318
|
Read this as an engineering curve, not an official VectorDBBench result. Annoy
|
|
1271
1319
|
is faster than exact NumPy at 50000 vectors but loses too much recall with the
|
|
@@ -1277,9 +1325,9 @@ FAISS persistence and service-mode Qdrant now both preserve exact recall at
|
|
|
1277
1325
|
`WAVEMIND_PGVECTOR_EF_SEARCH=400`, which improves recall materially but still
|
|
1278
1326
|
misses the `0.95` production target and is slower than the other two profiles.
|
|
1279
1327
|
The 100k load profile shows Qdrant service is already viable for candidate
|
|
1280
|
-
generation
|
|
1281
|
-
|
|
1282
|
-
|
|
1328
|
+
generation on the tested machine. The tuned 1M profile crosses the recall target,
|
|
1329
|
+
and the EF sweep gets close to the p99 latency target, but 1M should still be
|
|
1330
|
+
treated as tuning-in-progress until a 100+ query p99 run stays below 100 ms.
|
|
1283
1331
|
If a required package, service, or environment variable is missing, the runner
|
|
1284
1332
|
marks that engine as `skipped` instead of silently falling back to another
|
|
1285
1333
|
backend.
|
|
@@ -1444,7 +1492,7 @@ python benchmarks/dynamic_memory_benchmark.py --engines wavemind chroma --memori
|
|
|
1444
1492
|
| Dynamic memory priority | Wave-field hotness, TTL, priority | Metadata/filter driven | Payload/filter driven |
|
|
1445
1493
|
| Built-in forgetting | TTL and explicit forget | Manual delete/filtering | Manual delete/filtering |
|
|
1446
1494
|
| Best fit | Small to medium memory streams with dynamic recall | Local RAG apps and prototypes | Large-scale vector search |
|
|
1447
|
-
| Scale target today |
|
|
1495
|
+
| Scale target today | Local exact mode for small streams; FAISS/Qdrant/pgvector plus replicated namespaces for production paths | Larger than WaveMind local exact mode | Production vector scale |
|
|
1448
1496
|
|
|
1449
1497
|
WaveMind is not trying to replace dedicated vector databases at scale. The intended product gap is dynamic priority: frequently used memories can become hotter while old or low-priority memories fade. For static RAG over large document collections, use a mature vector database. For memory that needs persistence, scoped recall, TTL, forgetting, and reinforcement, WaveMind is designed to sit above or beside the vector index.
|
|
1450
1498
|
|
|
@@ -1472,8 +1520,9 @@ If you already use Chroma for local memory, see the practical migration guide:
|
|
|
1472
1520
|
from SQLite on load/build, so large service-mode deployments still need a
|
|
1473
1521
|
measured rebuild strategy and index-health monitoring.
|
|
1474
1522
|
- The persisted FAISS backend validates a snapshot against current memory ids
|
|
1475
|
-
and avoids unnecessary FAISS rebuilds when the snapshot matches.
|
|
1476
|
-
a single-node flat-index path
|
|
1523
|
+
and avoids unnecessary FAISS rebuilds when the snapshot matches. FAISS itself
|
|
1524
|
+
is a single-node flat-index path; use `ReplicatedWaveMind` or external
|
|
1525
|
+
database/service replication when that is not enough.
|
|
1477
1526
|
- The `quantized` backend is an explicit int8 candidate-index experiment. It
|
|
1478
1527
|
reduces vector precision and must be benchmarked per workload before use.
|
|
1479
1528
|
- The synthetic long-term memory evidence benchmark is useful for regression and product-shape proof, but public claims should lean on LoCoMo and LongMemEval instead.
|
|
@@ -18,9 +18,10 @@ This is a compact reader-facing view of checked-in benchmark results. It is not
|
|
|
18
18
|
| [LongMemEval evidence 50-query smoke](https://github.com/xiaowu0162/LongMemEval) | long-term-agent-memory | evidence recall@k | WaveMind: 0.92 / 15.3 ms | Static vector: 0.6 / 0.337 ms | WaveMind leads on quality |
|
|
19
19
|
| [ANN index latency curve](https://github.com/erikbern/ann-benchmarks) | index-latency | Recall@k | WaveMind numpy: 1 / 6.485 ms | Qdrant local: 1 / 43.5 ms | Quality tie; WaveMind faster |
|
|
20
20
|
| Production index profile | index-latency | Recall@k | WaveMind faiss-persisted: 1 / 3.524 ms | Qdrant service: 1 / 4.414 ms | Quality tie; WaveMind faster |
|
|
21
|
-
| Production load profile 100k | production-scale | Recall@k | WaveMind pgvector: 0.736 / 17.8 ms | Qdrant service: 1 / 10.
|
|
22
|
-
| Production load profile 1M | production-scale | Recall@k | - | Qdrant service: 0.
|
|
23
|
-
|
|
|
21
|
+
| Production load profile 100k | production-scale | Recall@k | WaveMind pgvector: 0.736 / 17.8 ms | Qdrant service: 1 / 10.3 ms | Baseline leads on quality |
|
|
22
|
+
| Production load profile 1M | production-scale | Recall@k | - | Qdrant service: 0.984 / 116.8 ms | No WaveMind result |
|
|
23
|
+
| Qdrant 1M HNSW ef sweep | production-scale | Recall@k | - | hnsw_ef=2048: 0.977 / 64.8 ms | No WaveMind result |
|
|
24
|
+
| Scale readiness profile | production-scale | precision@1 | WaveMind structured payloads: 1 / 0.837 ms | - | WaveMind-only check |
|
|
24
25
|
| Memory competitor adapter profile | agent-memory | precision@1 | WaveMind: 0.8 / 0.554 ms | - | WaveMind-only check |
|
|
25
26
|
| [LongMemEval answer generation](https://github.com/xiaowu0162/LongMemEval) | long-term-agent-memory | token F1 | WaveMind + qwen2.5:1.5b: 0.333 / - | Chroma static + qwen2.5:1.5b: 0.17 / - | WaveMind leads on quality |
|
|
26
27
|
|
|
@@ -21,9 +21,10 @@ Planned rows are not claimed wins. They are the public proof path WaveMind must
|
|
|
21
21
|
| [LongMemEval evidence 50-query smoke](https://github.com/xiaowu0162/LongMemEval) | long-term-agent-memory | implemented | WaveMind: evidence recall@k 0.92, precision@1 0.76, MRR@k 0.83, context saved 0.87, avg latency 15.3, p95 latency 42.9<br>Chroma static: evidence recall@k 0.60, precision@1 0.26, MRR@k 0.39, context saved 0.89, avg latency 13.3, p95 latency 26.0<br>Static vector: evidence recall@k 0.60, precision@1 0.26, MRR@k 0.39, context saved 0.89, avg latency 0.34, p95 latency 2.14<br>Qdrant static: evidence recall@k 0.60, precision@1 0.26, MRR@k 0.39, context saved 0.89, avg latency 180.2, p95 latency 296.7 | Speed up full LongMemEval reruns by reusing per-question candidate indexes or adding a streaming runner mode. |
|
|
22
22
|
| [ANN index latency curve](https://github.com/erikbern/ann-benchmarks) | index-latency | implemented | WaveMind numpy: Recall@k 1.00, avg latency 6.49, p95 latency 6.41, build ms 744.7<br>WaveMind quantized: Recall@k 0.93, avg latency 24.9, p95 latency 37.4, build ms 2088.7<br>WaveMind annoy: Recall@k 0.73, avg latency 4.92, p95 latency 7.37, build ms 4090.1<br>WaveMind faiss: skipped - Install faiss-cpu to use FaissVectorIndex<br>Qdrant local: Recall@k 1.00, avg latency 43.5, p95 latency 59.7, build ms 17525.7 | Tune quantized search kernels, add FAISS on Linux/macOS CI, and test Qdrant service-mode curves beyond 50000 vectors. |
|
|
23
23
|
| Production index profile | index-latency | implemented | WaveMind faiss-persisted: Recall@k 1.00, avg latency 3.52, p95 latency 7.88, build ms 715.9<br>Qdrant service: Recall@k 1.00, avg latency 4.41, p95 latency 5.93, build ms 12269.8<br>WaveMind pgvector: Recall@k 0.81, avg latency 10.9, p95 latency 15.7, build ms 185048.9 | Use the dedicated production load profile for 100000 and 1000000-vector service tests, then tune pgvector and Qdrant for recall/latency. |
|
|
24
|
-
| Production load profile 100k | production-scale | implemented | Qdrant service: Recall@k 1.00, avg latency 10.
|
|
25
|
-
| Production load profile 1M | production-scale | implemented | Qdrant service: Recall@k 0.
|
|
26
|
-
|
|
|
24
|
+
| Production load profile 100k | production-scale | implemented | Qdrant service: Recall@k 1.00, avg latency 10.3, p95 latency 19.0, p99 latency ms 21.3, build ms 27439.3<br>WaveMind pgvector: Recall@k 0.74, avg latency 17.8, p95 latency 23.5, build ms 455703.7<br>WaveMind faiss-persisted: skipped - Set WAVEMIND_FAISS_PATH to use the persisted FAISS backend | Tune pgvector HNSW build/search parameters and add persisted FAISS from the Linux benchmark container. |
|
|
25
|
+
| Production load profile 1M | production-scale | implemented | Qdrant service: Recall@k 0.98, avg latency 116.8, p95 latency 153.8, p99 latency ms 209.3, build ms 450674.6 | Tune Qdrant indexing/search params further, then add FAISS IVF/HNSW and pgvector 1M profiles on a larger disk. |
|
|
26
|
+
| Qdrant 1M HNSW ef sweep | production-scale | implemented | hnsw_ef=512: Recall@k 0.75, avg latency 47.2, p95 latency 68.5, p99 latency ms 68.5, max latency ms 68.5<br>hnsw_ef=768: Recall@k 0.85, avg latency 44.0, p95 latency 69.1, p99 latency ms 69.8, max latency ms 69.8<br>hnsw_ef=1024: Recall@k 0.88, avg latency 62.9, p95 latency 81.1, p99 latency ms 85.5, max latency ms 85.5<br>hnsw_ef=1536: Recall@k 0.94, avg latency 65.6, p95 latency 111.2, p99 latency ms 119.7, max latency ms 119.7<br>hnsw_ef=2048: Recall@k 0.98, avg latency 64.8, p95 latency 91.2, p99 latency ms 103.8, max latency ms 103.8 | Repeat with 100+ queries and collection-level HNSW build parameters before claiming a stable production SLO. |
|
|
27
|
+
| Scale readiness profile | production-scale | implemented | WaveMind cluster planner: simulated memories 1000000, namespaces 4096, nodes 4, replication factor 2, node loss min availability 1.00, zone loss min availability 1.00, read quorum 1, write quorum 2, placement ms 115.8<br>WaveMind hot cache: queries 2000, capacity 512, hit rate 0.92, evictions 0, p99 lookup ms 0.00<br>WaveMind structured payloads: queries 4, precision@1 1.00, avg latency 0.84, p99 latency ms 1.07 | Move from single-node service profiles to namespace sharding and replicated service runs. |
|
|
27
28
|
| Memory competitor adapter profile | agent-memory | implemented | WaveMind: precision@1 0.80, precision@3 1.00, stale suppression 1.00, avg latency 0.55, p95 latency 0.83<br>Mem0: skipped - Install Mem0 to run this adapter profile: pip install "mem0ai"<br>Zep: skipped - Install the Zep client package and set ZEP_API_KEY or ZEP_API_URL.<br>LangGraph persistent memory: skipped - Install LangGraph to run this adapter profile: pip install "langgraph" | Add documented setup commands for each competitor adapter and store checked-in results only when those real adapters run. |
|
|
28
29
|
| [LongMemEval answer generation](https://github.com/xiaowu0162/LongMemEval) | long-term-agent-memory | implemented | extractive smoke: queries 20, evidence recall@k 1.00, exact match 0.00, contains answer 0.05, token f1 0.02, avg retrieval ms 3.79, avg generation ms 0.77<br>WaveMind + qwen2.5:0.5b: queries 50, evidence recall@k 0.92, exact match 0.12, contains answer 0.18, token f1 0.18, avg retrieval ms 2.98, avg generation ms 1428.2<br>Chroma static + qwen2.5:0.5b: queries 50, evidence recall@k 0.60, exact match 0.10, contains answer 0.12, token f1 0.13, avg retrieval ms 4.10, avg generation ms 1234.7<br>Qdrant static + qwen2.5:0.5b: queries 50, evidence recall@k 0.60, exact match 0.10, contains answer 0.12, token f1 0.13, avg retrieval ms 63.8, avg generation ms 893.5<br>WaveMind + qwen2.5:1.5b: queries 50, evidence recall@k 0.92, exact match 0.24, contains answer 0.38, token f1 0.33, avg retrieval ms 2.00, avg generation ms 2153.0<br>Chroma static + qwen2.5:1.5b: queries 50, evidence recall@k 0.60, exact match 0.12, contains answer 0.16, token f1 0.17, avg retrieval ms 7.05, avg generation ms 2082.4<br>Qdrant static + qwen2.5:1.5b: queries 50, evidence recall@k 0.60, exact match 0.12, contains answer 0.16, token f1 0.17, avg retrieval ms 100.2, avg generation ms 758.1 | Run all 470 non-abstention questions with a stronger local/API model and add faithfulness/abstention scoring. |
|
|
29
30
|
|