PyPI - wavemind - Versions diffs - 2.2.1__tar.gz → 2.2.2__tar.gz - Mend

wavemind 2.2.1tar.gz → 2.2.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (160) hide show

wavemind-2.2.2/Dockerfile ADDED Viewed

@@ -0,0 +1,28 @@
+FROM python:3.11-slim
+ARG INSTALL_OPTIONAL=false
+ARG INSTALL_OTEL=false
+ARG INSTALL_PRODUCTION=false
+ENV PYTHONDONTWRITEBYTECODE=1
+ENV PYTHONUNBUFFERED=1
+ENV WAVEMIND_DB=/data/wavemind.sqlite3
+ENV WAVEMIND_LOG_LEVEL=INFO
+WORKDIR /app
+RUN if [ "$INSTALL_OPTIONAL" = "true" ] || [ "$INSTALL_PRODUCTION" = "true" ]; then apt-get update && apt-get install -y --no-install-recommends build-essential && rm -rf /var/lib/apt/lists/*; fi
+COPY README.md pyproject.toml requirements.txt requirements-optional.txt ./
+RUN pip install --no-cache-dir -r requirements.txt \
+    && if [ "$INSTALL_OPTIONAL" = "true" ]; then pip install --no-cache-dir -r requirements-optional.txt; fi \
+    && if [ "$INSTALL_OTEL" = "true" ]; then pip install --no-cache-dir "opentelemetry-api>=1.25" "opentelemetry-sdk>=1.25" "opentelemetry-exporter-otlp>=1.25" "opentelemetry-instrumentation-fastapi>=0.46b0"; fi
+COPY wavemind ./wavemind
+COPY wavemind_v2.py ./wavemind_v2.py
+RUN if [ "$INSTALL_PRODUCTION" = "true" ]; then pip install --no-cache-dir -e ".[production]"; else pip install --no-cache-dir -e .; fi
+VOLUME ["/data", "/backups"]
+EXPOSE 8000
+CMD ["uvicorn", "wavemind.api:create_app", "--factory", "--host", "0.0.0.0", "--port", "8000"]

{wavemind-2.2.1 → wavemind-2.2.2}/MANIFEST.in RENAMED Viewed

@@ -14,13 +14,18 @@ include docs/RELEASE.md
 include docs/PROJECT_BOARD.md
 include docs/DEMO_SCRIPT.md
 include docs/LAUNCH_KIT.md
+include docs/BENCHMARK_BRIEF.md
 include docs/CHROMA_MIGRATION.md
+include docs/OBSERVABILITY.md
 include docs/RU_LAUNCH_POSTS.md
 include docs/USE_CASES.md
 include docs/assets/benchmark-summary.svg
 include docs/assets/wavemind-social-card.svg
+include docs/assets/wavemind-demo.gif
 include benchmarks/*.py
 include benchmarks/*.json
 include benchmarks/*.md
 include examples/*.py
+recursive-include examples/observability *
+recursive-include examples/production-index-profile *
 prune benchmarks/data

{wavemind-2.2.1 → wavemind-2.2.2}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: wavemind
-Version: 2.2.1
+Version: 2.2.2
 Summary: Local-first dynamic memory field with vector search and wave-field re-ranking
 License-Expression: MIT
 Project-URL: Homepage, https://github.com/CaspianG/wavemind
@@ -66,6 +66,8 @@ users or projects isolated.
 <img src="https://raw.githubusercontent.com/CaspianG/wavemind/main/docs/assets/wavemind-social-card.svg" alt="WaveMind dynamic memory overview" width="820">
+<img src="https://raw.githubusercontent.com/CaspianG/wavemind/main/docs/assets/wavemind-demo.gif" alt="WaveMind dynamic memory terminal demo" width="820">
 [Quick Start](#quick-start) |
 [CLI](#cli-cheat-sheet) |
 [Studio](#wavemind-studio) |
@@ -77,6 +79,8 @@ users or projects isolated.
 [Use Cases](docs/USE_CASES.md) |
 [HTTP API](#http-api) |
 [Benchmarks](#benchmark) |
+[Benchmark Brief](docs/BENCHMARK_BRIEF.md) |
+[Research Branches](#research-branches) |
 [Roadmap](#roadmap) |
 [Contributing](#contributing) |
 [Limitations](#known-limitations)
@@ -159,6 +163,7 @@ Start here if you only want to use WaveMind from the terminal:
 | Show first-run help | `wavemind quickstart` |
 | Store a memory | `wavemind remember "Andrey prefers short answers" --namespace user:42` |
 | Search memory | `wavemind query "answer style" --namespace user:42` |
+| Consolidate active patterns | `wavemind consolidate --namespace user:42 --seed "Rust compiler systems"` |
 | Open local dashboard | `wavemind studio` |
 | See stored state | `wavemind stats --namespace user:42` |
 | Delete a namespace | `wavemind forget --namespace user:42` |
@@ -272,11 +277,12 @@ wavemind --db ./state/app_memory.sqlite3 query "answer style" --namespace user:4
 | CrewAI or AutoGen loop | The adapters in `wavemind.integrations` |
 | Node, Go, Ruby, PHP, or no-code app | `wavemind serve` and the HTTP API |
 | Personal knowledge base | Store notes by project namespace and query locally |
-| Support or CRM workflow | Store customer issues, resolutions, preferences, and corrections |
-| Research or trading notebook | Store observations with source metadata and TTL for temporary hypotheses |
+| Support or CRM workflow | Customer issues, resolutions, preferences, corrections, TTL, and namespace isolation. See [`examples/customer_support_memory.py`](examples/customer_support_memory.py). |
+| Research or analyst notebook | Findings, hypotheses, decisions, source metadata, TTL, and project isolation. See [`examples/research_notebook_memory.py`](examples/research_notebook_memory.py). |
 For migrations from existing local vector memory, start with
-[`docs/CHROMA_MIGRATION.md`](docs/CHROMA_MIGRATION.md).
+[`docs/CHROMA_MIGRATION.md`](docs/CHROMA_MIGRATION.md). The guide has a tested
+offline fixture at [`examples/chroma_migration.py`](examples/chroma_migration.py).
 ## Minimal Agent Loop
@@ -322,6 +328,24 @@ python examples/dynamic_memory_demo.py
 That demo shows corrected facts outranking stale facts, temporary memory
 expiring, namespace isolation, and index-health reporting.
+To see the same behavior in a practical support/CRM workflow:
+```sh
+python examples/customer_support_memory.py
+```
+That demo stores customer preferences, billing tickets, stale CRM data,
+temporary discount codes, and separate customer namespaces.
+To see source-aware research memory:
+```sh
+python examples/research_notebook_memory.py
+```
+That demo stores analyst findings, temporary hypotheses, decisions, source
+metadata, and isolated project namespaces.
 ## How The Memory Field Works
 ```mermaid
@@ -335,6 +359,8 @@ flowchart LR
     R --> P["app, search UI, prompt, API, or tool"]
     P --> F["recall feedback updates hotness / priority"]
     F --> D
+    F --> C["consolidate active clusters"]
+    C --> D
 ```
 The wave field is the dynamic layer around stored memories. It is not a
@@ -350,12 +376,19 @@ memories should still matter.
 | TTL | This fact is temporary. | Drops out after expiry. |
 | namespace and tags | This belongs to one user/project/type. | Prevents cross-user or cross-topic leakage. |
 | graph dynamics | Related memories can excite or inhibit each other. | Helps clusters and corrections behave like memory, not a flat list. |
+| consolidation | Active clusters can become durable concept memories. | Turns repeated patterns into inspectable higher-level memories with provenance. |
 Technically, the current `MemoryFieldGraph` is a discrete graph over stored
 memories, not a continuous mathematical physics field. That honesty matters:
 WaveMind is useful today as a dynamic memory engine, while the research path is
 to make the field dynamics more explicit, measurable, and scalable.
+Self-organization is now part of the core surface. `consolidate_concepts()`,
+`wavemind consolidate`, and `POST /consolidate` can turn an active graph cluster
+into a new stored memory such as `Consolidated memory: systems...` without an
+LLM call. The generated memory keeps the source memory ids in metadata, so it is
+auditable instead of being a hidden summary.
 ## Optional Embeddings
 For sentence-transformer embeddings:
@@ -418,6 +451,10 @@ Optional pgvector environment variables:
 - `WAVEMIND_PGVECTOR_COLLECTION` - collection key, default `default`.
 - `WAVEMIND_PGVECTOR_CREATE_HNSW=1` - create an HNSW index using
   `vector_cosine_ops` when the installed pgvector version supports it.
+- `WAVEMIND_PGVECTOR_HNSW_M` - optional HNSW graph degree for index creation.
+- `WAVEMIND_PGVECTOR_HNSW_EF_CONSTRUCTION` - optional HNSW build accuracy setting.
+- `WAVEMIND_PGVECTOR_EF_SEARCH` - optional per-query HNSW search depth. Increase
+  it when pgvector is fast but recall is too low.
 If `WAVEMIND_PGVECTOR_DSN` is missing, WaveMind raises a clear error instead of
 silently falling back to another index backend.
@@ -438,6 +475,125 @@ production latency and durability should be measured against a real Qdrant
 service. If `WAVEMIND_QDRANT_URL` is missing, WaveMind raises a clear error
 instead of silently falling back to another backend.
+## Scale Readiness
+WaveMind now includes an explicit scale preflight:
+```sh
+wavemind scale-plan --target-memories 50000
+```
+For JSON output in CI or deployment checks:
+```sh
+wavemind --db ./state/wavemind.sqlite3 scale-plan --target-memories 50000 --json
+```
+To fail a deployment preflight when the plan needs action:
+```sh
+wavemind --db ./state/wavemind.sqlite3 scale-plan --target-memories 50000 --fail-on action_required --json
+```
+If you only want a plan for a future size without loading optional index
+packages:
+```sh
+wavemind --index faiss scale-plan --current-memories 10000 --target-memories 50000 --json
+```
+The scale plan reports:
+| field | meaning |
+|---|---|
+| `tier` | `small`, `medium`, `large-local`, `production-service`, or `million-plus`. |
+| `status` | `ok`, `watch`, `action_required`, or `architecture_required`. |
+| `recommended_index` | The candidate-index class to use before growth. |
+| `warnings` | Why the current path may fail at the target size. |
+| `actions` | Concrete setup, benchmark, rebuild, and index-health steps. |
+The same scale preflight is available over HTTP:
+```sh
+curl "http://127.0.0.1:8000/scale-plan?target_memories=50000"
+```
+Rule of thumb:
+| target memories | recommended path |
+|---:|---|
+| up to 1000 | SQLite + NumPy exact index. |
+| 1000 to 5000 | NumPy can work, but benchmark real queries. |
+| 5000 to 50000 | Persisted FAISS for local single-node, or Qdrant service. |
+| 50000 to 1M | Service-backed candidate index, namespace sharding, measured p95/p99. |
+| above 1M | External vector database plus WaveMind as the memory-policy layer. |
+Scale readiness profile:
+```sh
+python benchmarks/scale_readiness_benchmark.py --simulated-memories 1000000
+```
+Checked-in result:
+| profile | result |
+|---|---:|
+| Cluster planner | 4096 namespaces, 4 nodes, replication factor 2, single-node loss availability `1.000`. |
+| Hot cache | 2000 lookups, hit rate `0.920`, p99 lookup `0.01 ms`. |
+| Structured payloads | image/audio/table/event retrieval, precision@1 `1.000`, p99 `1.27 ms`. |
+This profile validates routing, cache behavior, and structured payload handling.
+It is not a 10M-vector load test. Real 100k, 1M, and 10M latency claims should
+come from service-backed FAISS/Qdrant/pgvector load tests on production-like
+hardware.
+Cluster placement planning:
+```sh
+wavemind cluster-plan \
+  --namespace-count 4096 \
+  --node node-a=10.0.0.1:8000 \
+  --node node-b=10.0.0.2:8000 \
+  --node node-c=10.0.0.3:8000 \
+  --replication-factor 2 \
+  --kubernetes \
+  --json
+```
+This uses deterministic rendezvous placement so each namespace has a primary
+and replica set. The emitted Kubernetes StatefulSet manifest is a deployment
+starting point; it does not claim Raft consensus or automatic distributed
+SQLite writes.
+The same planner is available over HTTP as `POST /cluster-plan`.
+## Structured And Multimodal Memory
+WaveMind can store non-text memories as structured text plus metadata. This is
+useful for product events, tables, call transcripts, and image/audio captions
+while keeping the same query API.
+```python
+from wavemind import WaveMind, image_payload, remember_payload
+memory = WaveMind()
+remember_payload(
+    memory,
+    image_payload("s3://demo/chart.png", caption="enterprise revenue expansion chart"),
+    namespace="research",
+)
+print(memory.query("enterprise expansion chart", namespace="research")[0].metadata)
+```
+Supported payload helpers:
+| helper | use case |
+|---|---|
+| `image_payload()` | image URI plus caption or alt text |
+| `audio_payload()` | audio URI plus transcript or summary |
+| `table_payload()` | compact table preview with row count |
+| `event_payload()` | structured product, user, or system event |
 ## Storage Backends
 SQLite is the default source of truth. For multi-tenant production deployments,
@@ -512,18 +668,23 @@ curl http://127.0.0.1:8000/audit?namespace=demo
 curl http://127.0.0.1:8000/metrics
 curl http://127.0.0.1:8000/observability
 curl http://127.0.0.1:8000/index/health
+curl "http://127.0.0.1:8000/scale-plan?target_memories=50000"
 curl -X POST http://127.0.0.1:8000/index/rebuild
+curl -X POST http://127.0.0.1:8000/consolidate -H "Content-Type: application/json" -d '{"namespace":"demo","seed_text":"Rust compiler systems","min_energy":0.01}'
 curl -X POST http://127.0.0.1:8000/backup -H "Content-Type: application/json" -d '{"path":"./backups","keep_last":7}'
 ```
 `/audit` returns mutation events such as `remember`, `forget`, `backup`, and
-`purge_expired`. Query audit is opt-in with `WAVEMIND_AUDIT_QUERIES=1` because
+`consolidate_concept`. Query audit is opt-in with `WAVEMIND_AUDIT_QUERIES=1` because
 writing an audit row for every query changes latency. `/metrics` returns a
 Prometheus-compatible text payload without adding a required dependency.
 `/index/health` reports source-of-truth versus candidate-index consistency.
 `/index/rebuild` rebuilds the candidate index from stored active memories and
 logs an `index_rebuild` audit event.
+Full observability guide and local Prometheus/OTEL examples:
+[`docs/OBSERVABILITY.md`](docs/OBSERVABILITY.md).
 OpenTelemetry traces are optional and off by default:
 ```sh
@@ -642,11 +803,17 @@ Framework examples in this repository:
 | LangChain memory | `examples/langchain_memory.py` |
 | OpenAI/OpenRouter-style agent loop | `examples/agent_with_memory.py` |
 | LangGraph hooks | `wavemind.integrations.langgraph`, `examples/framework_integrations.py` |
-| LlamaIndex-style retriever | `wavemind.integrations.llamaindex`, `examples/framework_integrations.py` |
+| LlamaIndex-style retriever | `wavemind.integrations.llamaindex`, `examples/llamaindex_retriever.py` |
 | CrewAI-style tools | `wavemind.integrations.crewai`, `examples/framework_integrations.py` |
 | AutoGen-style hooks | `wavemind.integrations.autogen`, `examples/framework_integrations.py` |
 | Namespace sharding | `examples/sharded_memory.py` |
+Run the dedicated offline LlamaIndex-style retriever example:
+```sh
+python examples/llamaindex_retriever.py
+```
 ## OpenClaw Integration
 [OpenClaw memory](https://docs.openclaw.ai/concepts/memory) is file-centered:
@@ -782,6 +949,18 @@ memory benchmark:
 In short: static vector search answers "what is nearest?" Dynamic memory also
 asks "what is still relevant, reinforced, scoped, and allowed to be remembered?"
+## Research Branches
+The main branch stays focused on the core WaveMind library: dynamic memory,
+storage, indexes, APIs, integrations, and public memory benchmarks.
+Experimental domains live in separate branches so they can move quickly without
+overloading the main README:
+| Branch | Scope |
+|---|---|
+| [`research/crypto-pattern-memory`](https://github.com/CaspianG/wavemind/tree/research/crypto-pattern-memory) | OHLCV pattern-memory research, historical analogue retrieval, and future backtest experiments. |
 ## Benchmark
 WaveMind tracks benchmarks in two layers:
@@ -791,6 +970,7 @@ WaveMind tracks benchmarks in two layers:
 Machine-readable benchmark matrix: `benchmarks/benchmark_matrix_results.json`.
 Full generated benchmark report: [`benchmarks/BENCHMARK_REPORT.md`](benchmarks/BENCHMARK_REPORT.md).
+Compact benchmark leaderboard: [`benchmarks/BENCHMARK_LEADERBOARD.md`](benchmarks/BENCHMARK_LEADERBOARD.md).
 Visual summary generated from the checked-in JSON results:
@@ -828,14 +1008,19 @@ Current read:
 |---|---|---|
 | Public agent-memory evidence | On official LoCoMo `locomo10.json`, WaveMind reaches `evidence_recall@5 0.386` with hash embeddings and `0.547` with sentence-transformers. Fair namespace-filtered Chroma reaches `0.257` / `0.407`; Qdrant reaches `0.263` / `0.409`. | WaveMind retrieves more labeled evidence. Chroma is still the fastest static vector-store baseline. Qdrant local payload filtering is much slower than service-mode Qdrant should be. |
 | Public retrieval sanity check | On BEIR SciFact, WaveMind reaches `nDCG@10 0.354`, `Recall@10 0.482`; Qdrant matches that quality; Chroma reaches `0.350` / `0.467` with identical hash embeddings. | Same-embedding retrieval quality is close. Chroma is fastest at `1.79 ms`; Qdrant local is `17.71 ms`; WaveMind exact path is `117.02 ms`. |
+| Public multilingual retrieval | On NoMIRACL Russian, sampled at 200 queries / 5000 compact candidate passages, WaveMind reaches `nDCG@10 0.434`, `Recall@10 0.516`, matching Qdrant and staying within `0.002` nDCG of Chroma on identical hash embeddings. | Russian same-embedding quality is at parity. Chroma is faster at `2.60 ms`; WaveMind is `10.22 ms`; Qdrant local is `18.86 ms`. |
 | Static agent recall | WaveMind `precision@1` equals Chroma at `0.82`; WaveMind `precision@3` is `0.90` vs Chroma `0.88`. | Competitive quality, but Chroma is faster on the static vector-store path. |
 | Dynamic memory policy | WaveMind reaches `1.00` stale suppression; Chroma static is `0.00`. | This is the strongest current differentiation: hotness, TTL, corrections, and namespaces. |
-| Field memory dynamics | Graph-enabled WaveMind reaches `1.00` `precision@1`, `1.00` stale suppression, and `1.00` concept formation vs static WaveMind at `0.20` / `0.20` / `0.00`. | This is still synthetic, but it is the first regression check for memory-to-memory excitation, conflict inhibition, and decay. |
+| Field memory dynamics | Graph-enabled WaveMind reaches `1.00` `precision@1`, `1.00` stale suppression, `1.00` concept formation, and `1.00` durable concept consolidation vs static WaveMind at `0.20` / `0.20` / `0.00` / `0.00`. | This is still synthetic, but it is now a regression check for memory-to-memory excitation, conflict inhibition, decay, and self-organization into auditable concept memories. |
 | Long-term evidence | WaveMind reaches `1.00` evidence recall@5, `1.00` precision@1, and `1.00` stale suppression on the synthetic long-memory evidence benchmark. | This is the first proof-shaped benchmark for agent memory: it measures whether stale/corrected/expired/cross-user facts stay out of retrieved evidence. |
 | Capacity | Static `precision@1` is `0.94` at 5000 memories; dynamic policy keeps `1.00` on the current checks. | Quality is holding on these checks, but dynamic latency must be optimized. |
 | LongMemEval full retrieval | On the official LongMemEval-S cleaned file, 470 non-abstention session-level questions, WaveMind reaches `evidence_recall@5 0.782` and `precision@1 0.696`; Chroma static reaches `0.518` / `0.355`; Qdrant static reaches `0.520` / `0.355`. | This is now the strongest public memory result in the repo. It is retrieval-only, not final answer quality. |
+| LongMemEval 50-query smoke | On the first 50 non-abstention LongMemEval-S questions, WaveMind reaches `evidence_recall@5 0.920`, `precision@1 0.760`, and `MRR@5 0.827`; Chroma/Qdrant static reach `0.600`, `0.260`, and `0.385`. | This is the fast regression profile for checking current changes before rerunning the full LongMemEval profile. WaveMind wins on quality; latency still needs work. |
 | ANN/index curve | At 50000 generated 128-d vectors, NumPy exact keeps `recall@10 1.000` at `6.49 ms`; quantized int8 keeps `0.934` at `24.92 ms`; Annoy is faster at `4.92 ms` but drops to `0.730` recall; Qdrant local keeps `1.000` recall at `43.49 ms`. | Current local scale boundary is clear: quantized search needs kernel work, Annoy needs tuning/FAISS, and Qdrant should be tested in service mode for a fair production comparison. |
-| Next public proof | LongMemEval / LoCoMo answer generation with a local LLM. | Retrieval is now measured. The next serious number should test answer accuracy, abstention, and faithfulness. |
+| Production load | At 100000 generated 128-d vectors, service-mode Qdrant reaches `recall@10 1.000`, avg `10.76 ms`; pgvector HNSW reaches `0.736`, avg `17.76 ms`; at 1M vectors Qdrant reaches `0.506`, avg `45.81 ms`. | Qdrant service is already usable at 100k. The 1M result is not production-grade yet: large-N service settings need tuning before claiming million-memory recall. |
+| Scale readiness | Deterministic 1M-memory simulation validates 4096 namespace placements over 4 nodes with replication factor 2, single-node-loss availability `1.000`, hot-cache hit rate `0.920`, and structured payload precision@1 `1.000`. | This proves routing/cache/payload foundations, not a 10M-vector load-test claim. Real 100k-10M production latency needs service-backed load tests. |
+| Memory competitor adapters | WaveMind reaches `precision@1 0.80`, `precision@3 1.00`, stale suppression `1.00` on the small adapter profile. Mem0, Zep, and LangGraph are listed as skipped unless their real packages/services are configured. | This prevents fake competitor claims. The adapter harness is ready; real Mem0/Zep/LangGraph results still need configured installs. |
+| LongMemEval local answer generation | With the same local Ollama `qwen2.5:1.5b`, WaveMind reaches `exact_match 0.240`, `contains_answer 0.380`, `token_f1 0.333`, and `evidence_recall@5 0.920`; Chroma and Qdrant static both reach `0.120`, `0.160`, `0.170`, and `0.600`. | This is the first checked-in end-to-end answer benchmark against Chroma/Qdrant. It is still a 50-question lightweight smoke run, not a full LongMemEval leaderboard score. |
 ### Real Benchmark Matrix
@@ -843,17 +1028,22 @@ Current read:
 |---|---|---|---|---|
 | Agent user-memory retrieval | Natural-language recall over 200 user facts. | implemented | Chroma | Match Chroma `precision@1`, beat `precision@3`, stay under 5 ms at 200 memories. |
 | Dynamic memory policy | Hot memory, TTL, corrections, stale suppression, namespace isolation. | implemented | Chroma static | Keep `precision@1` and stale suppression at 1.00, cut avg latency below 10 ms at 1000 memories. |
-| Field memory graph dynamics | Related memories excite each other, newer conflicting memories suppress stale facts, graph energy decays, and active clusters expose concept candidates. | implemented | WaveMind static | Keep `precision@1`, stale suppression, and concept formation at 1.00 while moving from synthetic checks to LoCoMo/LongMemEval evidence. |
+| Field memory graph dynamics | Related memories excite each other, newer conflicting memories suppress stale facts, graph energy decays, and active clusters can become durable concept memories. | implemented | WaveMind static | Keep `precision@1`, stale suppression, concept formation, and concept consolidation at 1.00 while moving from synthetic checks to LoCoMo/LongMemEval evidence. |
 | WaveMind capacity curve | How recall and latency change at 200 / 1000 / 5000 memories. | implemented | WaveMind-only today | Keep `precision@1 >= 0.95` at 5000 memories and dynamic latency below 20 ms. |
 | Long-term memory evidence | Evidence retrieval from long histories with profile, preference, correction, TTL, namespace, and filler noise. | implemented | Static vector / Chroma / Qdrant | Keep this as a small regression test while public LoCoMo and LongMemEval runners carry the stronger evidence claims. |
 | BEIR-style open retrieval runner | Public `corpus.jsonl`, `queries.jsonl`, `qrels/*.tsv` datasets with the same metrics for each engine. | implemented | WaveMind / Chroma / Qdrant | Use identical embeddings and report `nDCG@k`, `Recall@k`, `MRR@k`, `precision@1`, and latency. Current checked-in run: BEIR SciFact. |
+| NoMIRACL Russian retrieval | Russian human-annotated multilingual relevance over compact candidate passages. | implemented | WaveMind / Chroma / Qdrant | Keep same-embedding `nDCG@10` at parity, then rerun with sentence-transformers and full MIRACL Russian when disk/service capacity allows it. |
 | ANN/VectorDBBench-style local curve | Recall/latency tradeoff for candidate indexes on generated vectors. | implemented | NumPy exact / quantized int8 / Annoy / Qdrant local | Use this as the local engineering curve; official VectorDBBench remains future work. |
+| Production index profile | Docker-backed 50000-vector profile for persisted FAISS, Qdrant service, and PostgreSQL/pgvector HNSW. | implemented | FAISS / Qdrant service / pgvector | Keep service-mode candidate generation above `0.95` recall@10 and below 10 ms average query latency at 50000 vectors. |
+| Production load profile | 100k and 1M service-backed candidate-index checks. | implemented | Qdrant service / pgvector HNSW / FAISS persisted | 100k Qdrant is strong; 1M Qdrant and pgvector require tuning before production claims. |
+| Scale readiness profile | Cluster placement, single-node-loss simulation, hot-cache behavior, and structured/multimodal payload retrieval. | implemented | Mem0 / Zep / LangGraph persistent memory / GraphRAG target adapters | Use this as production foundation proof before real distributed 100k, 1M, and 10M load tests. |
+| Memory competitor adapter profile | Dynamic-memory scenario wired for external memory frameworks. | implemented | Mem0 / Zep / LangGraph persistent memory | Report real competitor results only when their packages/services are explicitly configured. |
 | [BEIR](https://github.com/beir-cellar/beir) | Standard zero-shot information retrieval quality. | planned | Chroma / Qdrant / FAISS | Stay within 0.02 `nDCG@10` on identical embeddings. |
 | [MTEB Retrieval](https://github.com/embeddings-benchmark/mteb) | Separates encoder quality from retrieval-store quality. | planned | Chroma / Qdrant / FAISS | Prove WaveMind does not reduce same-embedding retrieval quality. |
-| [MIRACL Russian](https://miracl.ai/) | Multilingual retrieval with Russian relevance judgments. | planned | Chroma / Qdrant / FAISS | Reach same-embedding parity on Russian `nDCG@10`. |
+| [MIRACL Russian](https://miracl.ai/) | Multilingual retrieval with Russian relevance judgments. | runner ready | Chroma / Qdrant / FAISS | NoMIRACL Russian compact run is implemented; full-corpus MIRACL Russian remains the next heavier profile. |
 | [VectorDBBench](https://github.com/zilliztech/VectorDBBench) | Vector database insertion/search/filter/cost-performance benchmark. | planned | Chroma / Qdrant / Milvus / Weaviate / Pinecone / FAISS | Use only after WaveMind has a production index path; today it is a memory layer, not a standalone cloud vector DB. |
 | [LoCoMo](https://arxiv.org/abs/2402.17753) | Long conversation memory, temporal consistency, multi-hop recall. Retrieval-only runner is implemented for official `locomo10.json`. | implemented | Static vector / Chroma / Qdrant | Improve answer generation accuracy on top of the stronger sentence-transformers evidence retrieval run. |
-| [LongMemEval](https://arxiv.org/abs/2410.10813) | Long-term assistant memory with updates and abstention. | implemented retrieval, answer runner ready | Static vector / Chroma / Qdrant / Mem0-style memory | Add LLM answer quality and abstention after retrieval. |
+| [LongMemEval](https://arxiv.org/abs/2410.10813) | Long-term assistant memory with updates and abstention. | implemented retrieval + local Ollama answer smoke | Static vector / Chroma / Qdrant / Mem0-style memory | Add stronger LLM answer quality, abstention, and Chroma/Qdrant RAG answer baselines. |
 | [LongMemEval-V2](https://arxiv.org/abs/2605.12493) | Web-agent memory: state recall, dynamic state, workflow gotchas. | planned | AgentRunbook-R / Chroma RAG / Qdrant RAG | Prove WaveMind can retrieve compact evidence from agent trajectories. |
 | [LMEB](https://github.com/KaLM-Embedding/LMEB) | Long-horizon memory embedding tasks beyond normal passage retrieval. | planned | Embedding-only baselines / Chroma / Qdrant | Choose the default semantic encoder using memory-specific tasks. |
 | [RAGBench](https://huggingface.co/datasets/rungalileo/ragbench) | Downstream RAG context and answer quality. | planned | Chroma RAG / Qdrant RAG / Pinecone RAG | Show whether stale-memory suppression improves context relevance. |
@@ -899,6 +1089,36 @@ Qdrant local preserves the same ranking quality and is much faster than the
 WaveMind NumPy exact path. The engineering target is a FAISS/Annoy candidate
 index with WaveMind's dynamic field policy applied only as a top-k re-ranker.
+### NoMIRACL Russian Retrieval
+WaveMind includes a compact multilingual retrieval runner for
+[NoMIRACL](https://huggingface.co/datasets/miracl/nomiracl), the negative-aware
+MIRACL relevance dataset. The checked-in run uses Russian `test.relevant`
+queries and the compact Russian candidate corpus. It is not a full-corpus
+MIRACL run; it is a reproducible multilingual relevance benchmark small enough
+to run on a local machine.
+```sh
+python benchmarks/nomiracl_russian_benchmark.py --download --dataset benchmarks/data/nomiracl-russian --engines wavemind chroma qdrant --top-k 10 --limit-queries 200 --limit-corpus 5000 --output benchmarks/nomiracl_russian_results.json
+```
+Checked-in NoMIRACL Russian result:
+200 Russian queries, 5000 compact candidate passages,
+`HashingTextEncoder`, top-k 10. Full machine-readable result:
+`benchmarks/nomiracl_russian_results.json`.
+| engine | nDCG@10 | Recall@10 | MRR@10 | precision@1 | avg latency | p95 latency |
+|---|---:|---:|---:|---:|---:|---:|
+| WaveMind | 0.434 | 0.516 | 0.489 | 0.410 | 10.22 ms | 15.53 ms |
+| Chroma | 0.435 | 0.519 | 0.490 | 0.410 | 2.60 ms | 3.44 ms |
+| Qdrant | 0.434 | 0.516 | 0.489 | 0.410 | 18.86 ms | 24.08 ms |
+Read this as multilingual same-embedding parity, not as a claim that the hash
+encoder is the best Russian semantic model. The next stronger run should use
+`sentence-transformers` on the same NoMIRACL split, then full MIRACL Russian
+when there is enough disk/service capacity.
 ### LoCoMo Evidence Retrieval
 WaveMind now includes a retrieval-only runner for the public
@@ -1011,18 +1231,35 @@ result: `benchmarks/longmemeval_evidence_results.json`.
 The Chroma and Qdrant baselines now use the same namespace/payload scope as
 WaveMind. Qdrant is run in local embedded mode; the Qdrant client warns that
 local mode is not recommended above 20000 points, so this latency should not be
-read as a service-mode Qdrant result. The next step is answer-quality evaluation
-with a local LLM.
+read as a service-mode Qdrant result.
-Answer-generation runner:
+Answer-generation runner with local Ollama:
 ```sh
-python benchmarks/longmemeval_answer_benchmark.py --dataset benchmarks/data/longmemeval_s_cleaned.json --provider ollama --model YOUR_LOCAL_MODEL --top-k 5 --output benchmarks/longmemeval_answer_results.json
+python benchmarks/longmemeval_answer_benchmark.py --dataset benchmarks/data/longmemeval_s_cleaned.json --provider ollama --model YOUR_LOCAL_MODEL --engines wavemind chroma qdrant --top-k 5 --output benchmarks/longmemeval_answer_results.json
 ```
+Checked-in local answer-generation smoke runs:
+50 non-abstention LongMemEval-S questions, compact retrieved evidence,
+same `HashingTextEncoder`, same local Ollama model, top-k 5. Full machine-readable results:
+`benchmarks/longmemeval_answer_qwen25_0_5b_50_results.json` and
+`benchmarks/longmemeval_answer_qwen25_1_5b_50_results.json`.
+| system | questions | evidence recall@5 | exact match | contains answer | token F1 | avg retrieval | avg generation |
+|---|---:|---:|---:|---:|---:|---:|---:|
+| WaveMind + Ollama `qwen2.5:0.5b` | 50 | 0.920 | 0.120 | 0.180 | 0.183 | 2.98 ms | 1428.20 ms |
+| Chroma static + Ollama `qwen2.5:0.5b` | 50 | 0.600 | 0.100 | 0.120 | 0.126 | 4.10 ms | 1234.69 ms |
+| Qdrant static + Ollama `qwen2.5:0.5b` | 50 | 0.600 | 0.100 | 0.120 | 0.126 | 63.80 ms | 893.48 ms |
+| WaveMind + Ollama `qwen2.5:1.5b` | 50 | 0.920 | 0.240 | 0.380 | 0.333 | 2.00 ms | 2153.00 ms |
+| Chroma static + Ollama `qwen2.5:1.5b` | 50 | 0.600 | 0.120 | 0.160 | 0.170 | 7.05 ms | 2082.38 ms |
+| Qdrant static + Ollama `qwen2.5:1.5b` | 50 | 0.600 | 0.120 | 0.160 | 0.170 | 100.20 ms | 758.11 ms |
 There is also an extractive smoke run that does not require a model:
 `benchmarks/longmemeval_answer_extractive_20_results.json`. It is only a runner
-check, not a meaningful final answer-quality benchmark.
+check, not a meaningful final answer-quality benchmark. The Ollama runs are real
+local LLM runs, but still lightweight smoke results rather than official
+LongMemEval leaderboard scores.
 ### ANN Index Curve
@@ -1040,13 +1277,12 @@ Add `qdrant-service` when `WAVEMIND_QDRANT_URL` points at a running Qdrant
 service. Add `faiss-persisted` when `WAVEMIND_FAISS_PATH` points at the FAISS
 snapshot file to validate persisted-index startup behavior.
-Production profile example:
+Reproducible Docker production profile:
 ```sh
-export WAVEMIND_FAISS_PATH="./state/ann-curve.faiss"
-export WAVEMIND_QDRANT_URL="http://localhost:6333"
-export WAVEMIND_PGVECTOR_DSN="postgresql://user:password@localhost:5432/wavemind"
-python benchmarks/ann_index_curve_benchmark.py --sizes 10000 50000 --dim 128 --queries 100 --top-k 10 --engines faiss-persisted qdrant-service pgvector --output benchmarks/production_index_profile_results.json
+docker compose -f examples/production-index-profile/docker-compose.yml up -d qdrant postgres
+docker compose -f examples/production-index-profile/docker-compose.yml run --rm benchmark
+docker compose -f examples/production-index-profile/docker-compose.yml down
 ```
 Checked-in 50000-vector point:
@@ -1059,15 +1295,62 @@ Checked-in 50000-vector point:
 | WaveMind faiss | skipped | - | - | - |
 | Qdrant local | 1.000 | 43.49 ms | 59.68 ms | 17525.7 ms |
+Checked-in production 50000-vector point:
+| engine | recall@10 | avg latency | p95 latency | build |
+|---|---:|---:|---:|---:|
+| WaveMind faiss-persisted | 1.000 | 3.52 ms | 7.88 ms | 715.9 ms |
+| Qdrant service | 1.000 | 4.41 ms | 5.93 ms | 12269.8 ms |
+| WaveMind pgvector | 0.811 | 10.95 ms | 15.69 ms | 185048.9 ms |
+Checked-in production load points:
+```sh
+python benchmarks/production_load_benchmark.py --sizes 100000 --dim 128 --queries 100 --top-k 10 --engines qdrant-service pgvector faiss-persisted
+python benchmarks/production_load_benchmark.py --sizes 1000000 --dim 128 --queries 50 --top-k 10 --engines qdrant-service --output benchmarks/production_load_qdrant_1m_results.json
+```
+| vectors | engine | recall@10 | avg latency | p95 latency | build |
+|---:|---|---:|---:|---:|---:|
+| 100000 | Qdrant service | 1.000 | 10.76 ms | 18.78 ms | 39873.2 ms |
+| 100000 | WaveMind pgvector | 0.736 | 17.76 ms | 23.48 ms | 455703.7 ms |
+| 100000 | WaveMind faiss-persisted | skipped | - | - | - |
+| 1000000 | Qdrant service | 0.506 | 45.81 ms | 65.18 ms | 563945.5 ms |
 Read this as an engineering curve, not an official VectorDBBench result. Annoy
 is faster than exact NumPy at 50000 vectors but loses too much recall with the
 current settings. The new `quantized` backend compresses vectors and keeps
 `0.934` recall@10 on this run, but the current Python/NumPy kernel is slower
 than exact NumPy; it is a memory-footprint baseline, not a latency win yet.
-FAISS persistence, service-mode Qdrant, and pgvector are now explicit benchmark
-profiles. If a required package, service, or environment variable is missing,
-the runner marks that engine as `skipped` instead of silently falling back to
-another backend.
+FAISS persistence and service-mode Qdrant now both preserve exact recall at
+50000 generated vectors. The checked-in pgvector/HNSW profile uses
+`WAVEMIND_PGVECTOR_EF_SEARCH=400`, which improves recall materially but still
+misses the `0.95` production target and is slower than the other two profiles.
+The 100k load profile shows Qdrant service is already viable for candidate
+generation; the 1M Qdrant profile shows that default service settings are not
+enough for production recall and need HNSW/search tuning before million-memory
+claims.
+If a required package, service, or environment variable is missing, the runner
+marks that engine as `skipped` instead of silently falling back to another
+backend.
+### Memory Competitor Adapter Profile
+WaveMind includes a small dynamic-memory adapter profile for Mem0, Zep, and
+LangGraph persistent memory. It checks corrections, TTL, namespace isolation,
+and preference recall. Missing competitors are marked `skipped` with setup
+reasons instead of being approximated.
+```sh
+python benchmarks/memory_competitor_benchmark.py --engines wavemind mem0 zep langgraph
+```
+| engine | precision@1 | precision@3 | stale suppression | avg latency |
+|---|---:|---:|---:|---:|
+| WaveMind | 0.80 | 1.00 | 1.00 | 0.55 ms |
+| Mem0 | skipped | - | - | - |
+| Zep | skipped | - | - | - |
+| LangGraph persistent memory | skipped | - | - | - |
 ### Current Local Runs
@@ -1076,13 +1359,13 @@ Field memory dynamics benchmark:
 13 memories, 5 conflicting-fact queries, deterministic local encoder.
 This benchmark isolates the `MemoryFieldGraph`: related memories can spread
 activation, newer conflicting memories inhibit stale facts, graph energy decays,
-and active clusters can surface concept candidates.
+and active clusters can surface and persist concept memories.
 Full machine-readable result: `benchmarks/field_memory_dynamics_results.json`.
-| engine | precision@1 | precision@3 | stale suppression | concept formation | decay ratio | avg latency |
-|---|---:|---:|---:|---:|---:|---:|
-| WaveMind graph | 1.00 | 1.00 | 1.00 | 1.00 | 0.81 | 0.82 ms |
-| WaveMind static | 0.20 | 1.00 | 0.20 | 0.00 | 0.00 | 0.43 ms |
+| engine | precision@1 | precision@3 | stale suppression | concept formation | concept consolidation | decay ratio | avg latency |
+|---|---:|---:|---:|---:|---:|---:|---:|
+| WaveMind graph | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.81 | 1.81 ms |
+| WaveMind static | 0.20 | 1.00 | 0.20 | 0.00 | 0.00 | 0.00 | 0.48 ms |
 Run locally from a cloned repository:
@@ -1223,6 +1506,7 @@ If you already use Chroma for local memory, see the practical migration guide:
 - Optimal capacity on the current NumPy exact index is up to 1000 records.
 - At 5000 records, one-word `precision@1` is currently 0.72 with the hash encoder; many misses are ambiguous queries where another sentence containing the same word ranks first.
 - For `N > 5000`, the NumPy exact index is still reliable but scales linearly. Annoy is faster at 50000 vectors in the local curve, but current recall is only `0.730`; the `quantized` backend reaches `0.934` recall@10 but is slower than NumPy on the current kernel. Use FAISS or a production vector service before claiming large-scale ANN quality.
+- Run `wavemind scale-plan --target-memories <N>` before growing a deployment. It is a guardrail, not a benchmark replacement: it tells you when NumPy is no longer the right candidate index and which checks to run next.
 - `sentence-transformers/paraphrase-multilingual-mpnet-base-v2` requires about 420 MB of model files. Benchmark runners cache embeddings so retrieval latency is measured separately from model encoding latency.
 - The Chroma comparison currently uses shared precomputed hash embeddings to isolate retrieval/ranking behavior; semantic model comparisons should be run separately.
 - The BEIR SciFact run uses the hash encoder to isolate index/retrieval behavior. It is not a semantic embedding leaderboard result.
@@ -1243,10 +1527,10 @@ If you already use Chroma for local memory, see the practical migration guide:
 - The `quantized` backend is an explicit int8 candidate-index experiment. It
   reduces vector precision and must be benchmarked per workload before use.
 - The synthetic long-term memory evidence benchmark is useful for regression and product-shape proof, but public claims should lean on LoCoMo and LongMemEval instead.
-- The LongMemEval result is retrieval-only. It is not a full LongMemEval answer-generation leaderboard-equivalent score.
+- The main LongMemEval evidence result is retrieval-only. The checked-in Ollama answer-generation comparison now includes WaveMind, Chroma static, and Qdrant static over 50 questions, but it is still not a full LongMemEval leaderboard-equivalent score.
 - Qdrant baselines in this README use embedded local mode. Qdrant itself warns that local mode is not recommended above 20000 points; use the `qdrant-service` benchmark profile before making production latency claims.
 - MTEB, MIRACL, LMEB, official VectorDBBench, and RAGBench are listed as the public benchmark roadmap, not as completed results yet.
-- Ollama answer generation is implemented, but the current machine has no local Ollama model available and the local Ollama API returns 502/connection-reset. The checked-in answer file is extractive smoke only, not an LLM score.
+- Local Ollama answer generation now works with `qwen2.5:0.5b` and `qwen2.5:1.5b`; WaveMind leads the checked-in Chroma/Qdrant smoke comparison, but answer quality is still limited by small-model reasoning and should be rerun with stronger local/API models before making product claims.
 - Public benchmark adapters require optional datasets, heavier dependencies, or running services. They are intentionally outside the minimal `pip install wavemind` path.
 - Dynamic memory is slower than static Chroma in the current local benchmark: 25.26 ms vs 1.75 ms average query latency on this machine.
 - Current WaveMind-only dynamic checks keep `precision@1` at 1.00 through 5000 memories, but average latency is around 48-54 ms. The next optimization target is field/re-ranking latency, not basic recall quality.

wavemind 2.2.1__tar.gz → 2.2.2__tar.gz

wavemind 2.2.1tar.gz → 2.2.2tar.gz