vector-engine 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (41) hide show
  1. vector_engine-1.0.0/LICENSE +21 -0
  2. vector_engine-1.0.0/PKG-INFO +342 -0
  3. vector_engine-1.0.0/README.md +324 -0
  4. vector_engine-1.0.0/pyproject.toml +32 -0
  5. vector_engine-1.0.0/setup.cfg +4 -0
  6. vector_engine-1.0.0/tests/test_api_stability.py +42 -0
  7. vector_engine-1.0.0/tests/test_artifact_contracts.py +83 -0
  8. vector_engine-1.0.0/tests/test_core.py +39 -0
  9. vector_engine-1.0.0/tests/test_credibility_audit.py +80 -0
  10. vector_engine-1.0.0/tests/test_faiss_optional.py +27 -0
  11. vector_engine-1.0.0/tests/test_hardening.py +36 -0
  12. vector_engine-1.0.0/tests/test_ml_eval.py +37 -0
  13. vector_engine-1.0.0/tests/test_persistence_compat.py +37 -0
  14. vector_engine-1.0.0/tests/test_rag_reliability.py +31 -0
  15. vector_engine-1.0.0/tests/test_real_corpus_eval.py +78 -0
  16. vector_engine-1.0.0/tests/test_release_bundle.py +8 -0
  17. vector_engine-1.0.0/tests/test_v02_features.py +117 -0
  18. vector_engine-1.0.0/vector_engine/__init__.py +16 -0
  19. vector_engine-1.0.0/vector_engine/array.py +153 -0
  20. vector_engine-1.0.0/vector_engine/backends/__init__.py +13 -0
  21. vector_engine-1.0.0/vector_engine/backends/base.py +28 -0
  22. vector_engine-1.0.0/vector_engine/backends/bruteforce.py +107 -0
  23. vector_engine-1.0.0/vector_engine/backends/faiss_backend.py +123 -0
  24. vector_engine-1.0.0/vector_engine/backends/registry.py +15 -0
  25. vector_engine-1.0.0/vector_engine/eval/__init__.py +17 -0
  26. vector_engine-1.0.0/vector_engine/eval/retrieval.py +173 -0
  27. vector_engine-1.0.0/vector_engine/index.py +190 -0
  28. vector_engine-1.0.0/vector_engine/io/__init__.py +3 -0
  29. vector_engine-1.0.0/vector_engine/io/manifest.py +50 -0
  30. vector_engine-1.0.0/vector_engine/metric.py +58 -0
  31. vector_engine-1.0.0/vector_engine/ml/__init__.py +4 -0
  32. vector_engine-1.0.0/vector_engine/ml/clustering.py +56 -0
  33. vector_engine-1.0.0/vector_engine/ml/knn.py +71 -0
  34. vector_engine-1.0.0/vector_engine/results.py +15 -0
  35. vector_engine-1.0.0/vector_engine/training/__init__.py +3 -0
  36. vector_engine-1.0.0/vector_engine/training/hard_negative.py +140 -0
  37. vector_engine-1.0.0/vector_engine.egg-info/PKG-INFO +342 -0
  38. vector_engine-1.0.0/vector_engine.egg-info/SOURCES.txt +39 -0
  39. vector_engine-1.0.0/vector_engine.egg-info/dependency_links.txt +1 -0
  40. vector_engine-1.0.0/vector_engine.egg-info/requires.txt +12 -0
  41. vector_engine-1.0.0/vector_engine.egg-info/top_level.txt +1 -0
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Vector Engine Contributors
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,342 @@
1
+ Metadata-Version: 2.4
2
+ Name: vector-engine
3
+ Version: 1.0.0
4
+ Summary: ML-first vector computation and retrieval engine.
5
+ Author: Vector Engine Contributors
6
+ License-Expression: MIT
7
+ Requires-Python: >=3.10
8
+ Description-Content-Type: text/markdown
9
+ License-File: LICENSE
10
+ Requires-Dist: numpy>=1.24
11
+ Provides-Extra: faiss
12
+ Requires-Dist: faiss-cpu>=1.7.4; (platform_system != "Darwin" or platform_machine != "arm64") and extra == "faiss"
13
+ Provides-Extra: ml
14
+ Requires-Dist: scikit-learn>=1.3; extra == "ml"
15
+ Provides-Extra: dev
16
+ Requires-Dist: pytest>=7.4; extra == "dev"
17
+ Dynamic: license-file
18
+
19
+ # Vector Engine
20
+
21
+ ML-first vector computation and retrieval for Python.
22
+
23
+ Vector Engine provides one clean API for exact search, ANN backends, metadata-aware retrieval, and ML utilities such as kNN and retrieval metrics.
24
+
25
+ ## Why this exists
26
+
27
+ - ANN libraries are powerful but low-level and backend-specific.
28
+ - Vector DBs solve infra and ops, but many ML workflows need fast local experimentation.
29
+ - Existing ML APIs do not offer a unified, backend-pluggable vector layer for embedding-heavy work.
30
+
31
+ ## Install
32
+
33
+ ```bash
34
+ pip install -e .
35
+ ```
36
+
37
+ Optional extras:
38
+
39
+ ```bash
40
+ pip install -e ".[dev,ml]"
41
+ pip install -e ".[faiss]"
42
+ ```
43
+
44
+ ## API contracts (v1.0 target)
45
+
46
+ - `VectorArray` accepts only 2D arrays with shape `(n, d)` where `n > 0` and `d > 0`.
47
+ - `VectorArray` IDs must be unique and must be `int` or `str`.
48
+ - Metadata length must always align with number of vectors.
49
+ - `VectorIndex.search(..., k=...)` requires `k` to be an integer `> 0`.
50
+ - Score direction is explicit in `Metric.higher_is_better`:
51
+ - cosine/ip: higher is better
52
+ - l2: lower is better
53
+ - `kmeans(..., random_state=...)` requires an integer seed and finite vectors.
54
+ - `mine_hard_negatives` supports `top1`, `topk_sample`, and `distance_band` strategies with explicit exclusion controls.
55
+ - Retrieval eval validates malformed ground-truth shapes/types with stable `eval_error` messages.
56
+
57
+ ## 60-second quickstart
58
+
59
+ ```python
60
+ import numpy as np
61
+ from vector_engine import VectorArray, VectorIndex
62
+
63
+ xb = VectorArray.from_numpy(
64
+ np.random.randn(1000, 384).astype("float32"),
65
+ ids=[f"doc-{i}" for i in range(1000)],
66
+ normalize=True,
67
+ )
68
+ xq = VectorArray.from_numpy(np.random.randn(2, 384).astype("float32"), normalize=True)
69
+
70
+ index = VectorIndex.create(xb, metric="cosine", backend="bruteforce")
71
+ res = index.search(xq, k=5)
72
+ print(res.ids[0], res.scores[0])
73
+ ```
74
+
75
+ ## Core API
76
+
77
+ - `VectorArray`: canonical vector storage with IDs and metadata.
78
+ - `Metric`: built-in and custom metric definitions.
79
+ - `VectorIndex`: backend-agnostic build/add/search/save/load.
80
+ - `vector_engine.ml`: `knn_classify`, `knn_regress`, `kmeans`.
81
+ - `vector_engine.training`: `mine_hard_negatives` with configurable sampling strategies.
82
+ - `vector_engine.eval`: `precision_at_k`, `recall_at_k`, `ndcg_at_k`, `retrieval_report`, `retrieval_report_detailed`, `batch_metrics_summary`.
83
+
84
+ ## Backend support matrix
85
+
86
+ | Backend | Search | Add | Save/Load | Custom Metric |
87
+ | --- | ---: | ---: | ---: | ---: |
88
+ | `bruteforce` | yes | yes | yes | yes |
89
+ | `faiss` | yes | yes | yes | no |
90
+
91
+ ## Examples and notebooks
92
+
93
+ - `notebooks/01_semantic_search.ipynb`
94
+ - `notebooks/02_knn_baseline.ipynb`
95
+ - `notebooks/03_recommender_similarity.ipynb`
96
+
97
+ ## Benchmarks
98
+
99
+ Run:
100
+
101
+ ```bash
102
+ python benchmarks/compare_bruteforce_vs_faiss.py --mode exact
103
+ ```
104
+
105
+ With exact-overlap gate and artifact output:
106
+
107
+ ```bash
108
+ python benchmarks/compare_bruteforce_vs_faiss.py --mode exact --min-flat-overlap 0.99 --output artifacts/benchmark_exact.json
109
+ ```
110
+
111
+ Benchmark matrix (publishable aggregate):
112
+
113
+ ```bash
114
+ python scripts/benchmark_matrix.py --mode exact --warmup 2 --loops 8 --seed 7 --min-flat-overlap 0.99 --output-dir artifacts/benchmark_matrix
115
+ ```
116
+
117
+ Compose publishable summary bundle:
118
+
119
+ ```bash
120
+ python scripts/publishable_results.py --matrix-summary artifacts/benchmark_matrix/matrix_summary.json --stability-summary artifacts/testing_runs/stability_summary_bruteforce_200.json --output artifacts/benchmark_matrix/publishable_results.v1.json
121
+ ```
122
+
123
+ ANN mode (optional):
124
+
125
+ ```bash
126
+ python benchmarks/compare_bruteforce_vs_faiss.py --mode ann
127
+ ```
128
+
129
+ The benchmark reports:
130
+
131
+ - `qps`: queries per second (higher is better)
132
+ - `latency_p50_ms` and `latency_p95_ms`: median and tail latency (lower is better)
133
+ - `overlap_vs_bruteforce`: top-k neighbor overlap against exact brute-force (closer to `1.0` is better)
134
+ - `memory_mb_estimate`: coarse in-process memory estimate for vector/query buffers
135
+
136
+ Recommended protocol for publishable results:
137
+
138
+ - Use fixed seed and fixed hardware notes.
139
+ - Warm up before timed runs.
140
+ - Run at least 3 repeated trials and report median numbers.
141
+ - Keep dataset size (`n`, `d`, `nq`, `k`) fixed across backend comparisons.
142
+ - Include machine-readable matrix summary artifacts in release evidence.
143
+
144
+ ## Validation snapshot
145
+
146
+ Artifacts produced in this repo:
147
+
148
+ - Real-corpus style 3-run reports:
149
+ - `artifacts/real_corpus_runs/run_1.json`
150
+ - `artifacts/real_corpus_runs/run_2.json`
151
+ - `artifacts/real_corpus_runs/run_3.json`
152
+ - Faiss Flat exact-equivalence checks (3 runs):
153
+ - `artifacts/faiss_equivalence/run_1.json`
154
+ - `artifacts/faiss_equivalence/run_2.json`
155
+ - `artifacts/faiss_equivalence/run_3.json`
156
+ - 200-run stability study:
157
+ - `artifacts/testing_runs/stability_runs_bruteforce_200.jsonl`
158
+ - `artifacts/testing_runs/stability_summary_bruteforce_200.json`
159
+ - `artifacts/testing_runs/stability_plot_p95_qps.png`
160
+ - Matrix benchmark summary:
161
+ - `artifacts/benchmark_matrix/matrix_summary.json`
162
+ - `artifacts/benchmark_matrix/publishable_results.v1.json`
163
+
164
+ Observed outcomes for current mock/public-safe corpus:
165
+
166
+ - 3-run quality is identical across runs (`recall@1/3/6 = 1.0`, `ndcg@1/3/6 = 1.0`).
167
+ - 3-run latency envelope: p95 ranges from `0.0376 ms` to `0.0717 ms`.
168
+ - Faiss Flat exact mode achieves `overlap_vs_bruteforce = 1.0` for all 3 runs with `--min-flat-overlap 0.99`.
169
+ - In exact benchmark runs (`n=10000`, `d=128`, `nq=200`, `k=10`), Faiss Flat p95 latency is `4.17-15.03 ms` vs bruteforce `29.99-37.63 ms`.
170
+ - 200-run bruteforce study: p95 mean `0.0255 ms` (95% interval `0.0203-0.0547 ms`), QPS mean `188,097` (95% interval `117,499-214,111`).
171
+
172
+ Run the 200-run study:
173
+
174
+ ```bash
175
+ python3 scripts/stability_runs.py \
176
+ --embeddings artifacts/real_corpus_inputs/embeddings.npy \
177
+ --query-embeddings artifacts/real_corpus_inputs/query_embeddings.npy \
178
+ --ids artifacts/real_corpus_inputs/ids.json \
179
+ --ground-truth artifacts/real_corpus_inputs/ground_truth.json \
180
+ --metadata artifacts/real_corpus_inputs/metadata.json \
181
+ --backend bruteforce \
182
+ --k 6 \
183
+ --ks 1,3,6 \
184
+ --loops 5 \
185
+ --run-count 200 \
186
+ --output-dir artifacts/testing_runs \
187
+ --threshold-recall 0.75 \
188
+ --threshold-ndcg 0.70 \
189
+ --threshold-p95-ms 120
190
+ ```
191
+
192
+ Example result table format:
193
+
194
+ | Backend | QPS | p50 ms | p95 ms | overlap@k vs brute-force |
195
+ | --- | ---: | ---: | ---: | ---: |
196
+ | bruteforce | ... | ... | ... | 1.000 |
197
+ | faiss_flat | ... | ... | ... | ... |
198
+ | faiss_ivf (optional) | ... | ... | ... | ... |
199
+
200
+ ## Integration quickstarts
201
+
202
+ ### Local RAG app path
203
+
204
+ ```bash
205
+ pip install -e ".[dev,ml]"
206
+ python examples/minimal_rag_integration.py
207
+ ```
208
+
209
+ ### Batch evaluation path
210
+
211
+ ```bash
212
+ python scripts/rag_real_corpus_eval.py \
213
+ --embeddings artifacts/real_corpus_inputs/embeddings.npy \
214
+ --query-embeddings artifacts/real_corpus_inputs/query_embeddings.npy \
215
+ --ids artifacts/real_corpus_inputs/ids.json \
216
+ --ground-truth artifacts/real_corpus_inputs/ground_truth.json \
217
+ --metadata artifacts/real_corpus_inputs/metadata.json \
218
+ --output artifacts/real_corpus_runs/run_1.json \
219
+ --backend bruteforce \
220
+ --k 6 \
221
+ --ks 1,3,6 \
222
+ --loops 5 \
223
+ --threshold-recall 0.75 \
224
+ --threshold-ndcg 0.70 \
225
+ --threshold-p95-ms 120
226
+ ```
227
+
228
+ ### Benchmark interpretation path
229
+
230
+ ```bash
231
+ python benchmarks/compare_bruteforce_vs_faiss.py \
232
+ --mode exact \
233
+ --min-flat-overlap 0.99 \
234
+ --output artifacts/faiss_equivalence/run_1.json
235
+ ```
236
+
237
+ - If `overlap_vs_bruteforce` is near `1.0`, approximation risk is low for that configuration.
238
+ - Use `latency_p95_ms` for user-facing SLO decisions.
239
+ - Use repeated runs + median values before publishing backend comparisons.
240
+
241
+ ### Minimal production path (copy-paste)
242
+
243
+ ```bash
244
+ pip install -e ".[dev,ml]"
245
+ python scripts/rag_baseline.py --output-dir artifacts --k 3
246
+ python scripts/rag_real_corpus_eval.py --embeddings ... --query-embeddings ... --ids ... --ground-truth ... --output artifacts/real_corpus_runs/run_1.json --backend bruteforce --k 10 --ks 1,5,10 --loops 5
247
+ python scripts/stability_runs.py --embeddings ... --query-embeddings ... --ids ... --ground-truth ... --backend bruteforce --run-count 200 --output-dir artifacts/testing_runs
248
+ python benchmarks/compare_bruteforce_vs_faiss.py --mode exact --min-flat-overlap 0.99 --output artifacts/faiss_equivalence/run_1.json
249
+ python scripts/benchmark_matrix.py --mode exact --warmup 2 --loops 8 --seed 7 --min-flat-overlap 0.99 --output-dir artifacts/benchmark_matrix
250
+ python scripts/publishable_results.py --matrix-summary artifacts/benchmark_matrix/matrix_summary.json --stability-summary artifacts/testing_runs/stability_summary_bruteforce_200.json --output artifacts/benchmark_matrix/publishable_results.v1.json
251
+ ```
252
+
253
+ Expected artifacts:
254
+
255
+ - `artifacts/rag_baseline_metrics.v1.json`
256
+ - `artifacts/real_corpus_runs/run_*.json`
257
+ - `artifacts/testing_runs/stability_summary_*.json`
258
+ - `artifacts/faiss_equivalence/run_*.json`
259
+ - `artifacts/benchmark_matrix/matrix_summary.json`
260
+ - `artifacts/benchmark_matrix/publishable_results.v1.json`
261
+
262
+ Further reading:
263
+
264
+ - `docs/integration_guides.md`
265
+ - `docs/reproducibility.md`
266
+ - `docs/kpi_charter.md`
267
+ - `docs/research_claims.md`
268
+ - `docs/credibility_audit.md`
269
+ - `docs/limitations.md`
270
+ - `docs/releases/v1.0.0.md`
271
+ - `docs/paper/reproducibility_appendix.md`
272
+
273
+ ## Publication and release bundle
274
+
275
+ Generate a release-bundle manifest that checks required docs/governance/evidence files:
276
+
277
+ ```bash
278
+ python scripts/build_release_bundle.py --output-dir artifacts/release_bundle
279
+ ```
280
+
281
+ ## Artifact policy (publish vs private)
282
+
283
+ - Safe to publish:
284
+ - benchmark result summaries
285
+ - stability aggregate summaries
286
+ - synthetic/mock input examples
287
+ - Keep private:
288
+ - real corpus raw embeddings
289
+ - query embeddings derived from private data
290
+ - sensitive metadata and ID mappings
291
+ - Recommended:
292
+ - commit docs + summary metrics in repo
293
+ - keep private input blobs in external storage
294
+
295
+ ## Project adoption checklist
296
+
297
+ - Install: `pip install -e ".[dev,ml]"` and optional `.[faiss]`.
298
+ - Validation: run `pytest -q`.
299
+ - Quality baseline: run `python scripts/rag_baseline.py`.
300
+ - Real corpus eval: run `python scripts/rag_real_corpus_eval.py --embeddings ... --query-embeddings ... --ids ... --ground-truth ... --threshold-recall 0.75 --threshold-ndcg 0.70 --threshold-p95-ms 120`.
301
+ - Persistence: verify `VectorIndex.save/load` on your own embeddings snapshot.
302
+ - Performance: run benchmark script with your target `n`, `d`, `nq`, `k`.
303
+ - Integration: run `python examples/minimal_rag_integration.py`.
304
+
305
+ ## Feature snapshot
306
+
307
+ - `kmeans` returns rich outputs (`labels`, `centers`, `inertia`, `n_iter`) with deterministic validation.
308
+ - Hard-negative mining supports `top1`, `topk_sample`, and `distance_band`, plus `exclude_ids` / `exclude_mask`.
309
+ - Retrieval evaluation includes `retrieval_report_detailed(include_per_query=...)` and `batch_metrics_summary(include_std=True)`.
310
+ - Public demo bootstrap is available under `demo_repo_template/`.
311
+
312
+ ## v1.0 readiness gates
313
+
314
+ - Benchmark matrix artifacts produced with fixed protocol and environment metadata.
315
+ - Stability harness demonstrates repeatability for latency/QPS/quality summaries.
316
+ - API stability contract documented in `docs/api.md` and enforced in `tests/test_api_stability.py`.
317
+ - Release packaging includes reproducible command blocks and artifact policy.
318
+
319
+ ## Governance and trust
320
+
321
+ - `LICENSE`
322
+ - `CITATION.cff`
323
+
324
+ ## Error cases
325
+
326
+ Stable error prefixes are used for fast debugging:
327
+
328
+ - `vector_array_error`: malformed array, IDs, metadata, subset lookup
329
+ - `metric_error`: unsupported or invalid metric definitions
330
+ - `index_error`: index lifecycle/search/add/persistence consistency issues
331
+ - `manifest_error`: missing/unsupported manifest fields or version
332
+
333
+ ## Troubleshooting
334
+
335
+ - **Faiss not available**
336
+ - Install with `pip install -e ".[faiss]"`.
337
+ - **Dimension mismatch at search/add**
338
+ - Ensure both base vectors and query vectors use the same embedding dimension.
339
+ - **Metric confusion**
340
+ - For cosine similarity, pass normalized vectors or set `normalize=True`.
341
+ - **Persistence load failure**
342
+ - Check manifest version compatibility and whether artifacts were modified after save.
@@ -0,0 +1,324 @@
1
+ # Vector Engine
2
+
3
+ ML-first vector computation and retrieval for Python.
4
+
5
+ Vector Engine provides one clean API for exact search, ANN backends, metadata-aware retrieval, and ML utilities such as kNN and retrieval metrics.
6
+
7
+ ## Why this exists
8
+
9
+ - ANN libraries are powerful but low-level and backend-specific.
10
+ - Vector DBs solve infra and ops, but many ML workflows need fast local experimentation.
11
+ - Existing ML APIs do not offer a unified, backend-pluggable vector layer for embedding-heavy work.
12
+
13
+ ## Install
14
+
15
+ ```bash
16
+ pip install -e .
17
+ ```
18
+
19
+ Optional extras:
20
+
21
+ ```bash
22
+ pip install -e ".[dev,ml]"
23
+ pip install -e ".[faiss]"
24
+ ```
25
+
26
+ ## API contracts (v1.0 target)
27
+
28
+ - `VectorArray` accepts only 2D arrays with shape `(n, d)` where `n > 0` and `d > 0`.
29
+ - `VectorArray` IDs must be unique and must be `int` or `str`.
30
+ - Metadata length must always align with number of vectors.
31
+ - `VectorIndex.search(..., k=...)` requires `k` to be an integer `> 0`.
32
+ - Score direction is explicit in `Metric.higher_is_better`:
33
+ - cosine/ip: higher is better
34
+ - l2: lower is better
35
+ - `kmeans(..., random_state=...)` requires an integer seed and finite vectors.
36
+ - `mine_hard_negatives` supports `top1`, `topk_sample`, and `distance_band` strategies with explicit exclusion controls.
37
+ - Retrieval eval validates malformed ground-truth shapes/types with stable `eval_error` messages.
38
+
39
+ ## 60-second quickstart
40
+
41
+ ```python
42
+ import numpy as np
43
+ from vector_engine import VectorArray, VectorIndex
44
+
45
+ xb = VectorArray.from_numpy(
46
+ np.random.randn(1000, 384).astype("float32"),
47
+ ids=[f"doc-{i}" for i in range(1000)],
48
+ normalize=True,
49
+ )
50
+ xq = VectorArray.from_numpy(np.random.randn(2, 384).astype("float32"), normalize=True)
51
+
52
+ index = VectorIndex.create(xb, metric="cosine", backend="bruteforce")
53
+ res = index.search(xq, k=5)
54
+ print(res.ids[0], res.scores[0])
55
+ ```
56
+
57
+ ## Core API
58
+
59
+ - `VectorArray`: canonical vector storage with IDs and metadata.
60
+ - `Metric`: built-in and custom metric definitions.
61
+ - `VectorIndex`: backend-agnostic build/add/search/save/load.
62
+ - `vector_engine.ml`: `knn_classify`, `knn_regress`, `kmeans`.
63
+ - `vector_engine.training`: `mine_hard_negatives` with configurable sampling strategies.
64
+ - `vector_engine.eval`: `precision_at_k`, `recall_at_k`, `ndcg_at_k`, `retrieval_report`, `retrieval_report_detailed`, `batch_metrics_summary`.
65
+
66
+ ## Backend support matrix
67
+
68
+ | Backend | Search | Add | Save/Load | Custom Metric |
69
+ | --- | ---: | ---: | ---: | ---: |
70
+ | `bruteforce` | yes | yes | yes | yes |
71
+ | `faiss` | yes | yes | yes | no |
72
+
73
+ ## Examples and notebooks
74
+
75
+ - `notebooks/01_semantic_search.ipynb`
76
+ - `notebooks/02_knn_baseline.ipynb`
77
+ - `notebooks/03_recommender_similarity.ipynb`
78
+
79
+ ## Benchmarks
80
+
81
+ Run:
82
+
83
+ ```bash
84
+ python benchmarks/compare_bruteforce_vs_faiss.py --mode exact
85
+ ```
86
+
87
+ With exact-overlap gate and artifact output:
88
+
89
+ ```bash
90
+ python benchmarks/compare_bruteforce_vs_faiss.py --mode exact --min-flat-overlap 0.99 --output artifacts/benchmark_exact.json
91
+ ```
92
+
93
+ Benchmark matrix (publishable aggregate):
94
+
95
+ ```bash
96
+ python scripts/benchmark_matrix.py --mode exact --warmup 2 --loops 8 --seed 7 --min-flat-overlap 0.99 --output-dir artifacts/benchmark_matrix
97
+ ```
98
+
99
+ Compose publishable summary bundle:
100
+
101
+ ```bash
102
+ python scripts/publishable_results.py --matrix-summary artifacts/benchmark_matrix/matrix_summary.json --stability-summary artifacts/testing_runs/stability_summary_bruteforce_200.json --output artifacts/benchmark_matrix/publishable_results.v1.json
103
+ ```
104
+
105
+ ANN mode (optional):
106
+
107
+ ```bash
108
+ python benchmarks/compare_bruteforce_vs_faiss.py --mode ann
109
+ ```
110
+
111
+ The benchmark reports:
112
+
113
+ - `qps`: queries per second (higher is better)
114
+ - `latency_p50_ms` and `latency_p95_ms`: median and tail latency (lower is better)
115
+ - `overlap_vs_bruteforce`: top-k neighbor overlap against exact brute-force (closer to `1.0` is better)
116
+ - `memory_mb_estimate`: coarse in-process memory estimate for vector/query buffers
117
+
118
+ Recommended protocol for publishable results:
119
+
120
+ - Use fixed seed and fixed hardware notes.
121
+ - Warm up before timed runs.
122
+ - Run at least 3 repeated trials and report median numbers.
123
+ - Keep dataset size (`n`, `d`, `nq`, `k`) fixed across backend comparisons.
124
+ - Include machine-readable matrix summary artifacts in release evidence.
125
+
126
+ ## Validation snapshot
127
+
128
+ Artifacts produced in this repo:
129
+
130
+ - Real-corpus style 3-run reports:
131
+ - `artifacts/real_corpus_runs/run_1.json`
132
+ - `artifacts/real_corpus_runs/run_2.json`
133
+ - `artifacts/real_corpus_runs/run_3.json`
134
+ - Faiss Flat exact-equivalence checks (3 runs):
135
+ - `artifacts/faiss_equivalence/run_1.json`
136
+ - `artifacts/faiss_equivalence/run_2.json`
137
+ - `artifacts/faiss_equivalence/run_3.json`
138
+ - 200-run stability study:
139
+ - `artifacts/testing_runs/stability_runs_bruteforce_200.jsonl`
140
+ - `artifacts/testing_runs/stability_summary_bruteforce_200.json`
141
+ - `artifacts/testing_runs/stability_plot_p95_qps.png`
142
+ - Matrix benchmark summary:
143
+ - `artifacts/benchmark_matrix/matrix_summary.json`
144
+ - `artifacts/benchmark_matrix/publishable_results.v1.json`
145
+
146
+ Observed outcomes for current mock/public-safe corpus:
147
+
148
+ - 3-run quality is identical across runs (`recall@1/3/6 = 1.0`, `ndcg@1/3/6 = 1.0`).
149
+ - 3-run latency envelope: p95 ranges from `0.0376 ms` to `0.0717 ms`.
150
+ - Faiss Flat exact mode achieves `overlap_vs_bruteforce = 1.0` for all 3 runs with `--min-flat-overlap 0.99`.
151
+ - In exact benchmark runs (`n=10000`, `d=128`, `nq=200`, `k=10`), Faiss Flat p95 latency is `4.17-15.03 ms` vs bruteforce `29.99-37.63 ms`.
152
+ - 200-run bruteforce study: p95 mean `0.0255 ms` (95% interval `0.0203-0.0547 ms`), QPS mean `188,097` (95% interval `117,499-214,111`).
153
+
154
+ Run the 200-run study:
155
+
156
+ ```bash
157
+ python3 scripts/stability_runs.py \
158
+ --embeddings artifacts/real_corpus_inputs/embeddings.npy \
159
+ --query-embeddings artifacts/real_corpus_inputs/query_embeddings.npy \
160
+ --ids artifacts/real_corpus_inputs/ids.json \
161
+ --ground-truth artifacts/real_corpus_inputs/ground_truth.json \
162
+ --metadata artifacts/real_corpus_inputs/metadata.json \
163
+ --backend bruteforce \
164
+ --k 6 \
165
+ --ks 1,3,6 \
166
+ --loops 5 \
167
+ --run-count 200 \
168
+ --output-dir artifacts/testing_runs \
169
+ --threshold-recall 0.75 \
170
+ --threshold-ndcg 0.70 \
171
+ --threshold-p95-ms 120
172
+ ```
173
+
174
+ Example result table format:
175
+
176
+ | Backend | QPS | p50 ms | p95 ms | overlap@k vs brute-force |
177
+ | --- | ---: | ---: | ---: | ---: |
178
+ | bruteforce | ... | ... | ... | 1.000 |
179
+ | faiss_flat | ... | ... | ... | ... |
180
+ | faiss_ivf (optional) | ... | ... | ... | ... |
181
+
182
+ ## Integration quickstarts
183
+
184
+ ### Local RAG app path
185
+
186
+ ```bash
187
+ pip install -e ".[dev,ml]"
188
+ python examples/minimal_rag_integration.py
189
+ ```
190
+
191
+ ### Batch evaluation path
192
+
193
+ ```bash
194
+ python scripts/rag_real_corpus_eval.py \
195
+ --embeddings artifacts/real_corpus_inputs/embeddings.npy \
196
+ --query-embeddings artifacts/real_corpus_inputs/query_embeddings.npy \
197
+ --ids artifacts/real_corpus_inputs/ids.json \
198
+ --ground-truth artifacts/real_corpus_inputs/ground_truth.json \
199
+ --metadata artifacts/real_corpus_inputs/metadata.json \
200
+ --output artifacts/real_corpus_runs/run_1.json \
201
+ --backend bruteforce \
202
+ --k 6 \
203
+ --ks 1,3,6 \
204
+ --loops 5 \
205
+ --threshold-recall 0.75 \
206
+ --threshold-ndcg 0.70 \
207
+ --threshold-p95-ms 120
208
+ ```
209
+
210
+ ### Benchmark interpretation path
211
+
212
+ ```bash
213
+ python benchmarks/compare_bruteforce_vs_faiss.py \
214
+ --mode exact \
215
+ --min-flat-overlap 0.99 \
216
+ --output artifacts/faiss_equivalence/run_1.json
217
+ ```
218
+
219
+ - If `overlap_vs_bruteforce` is near `1.0`, approximation risk is low for that configuration.
220
+ - Use `latency_p95_ms` for user-facing SLO decisions.
221
+ - Use repeated runs + median values before publishing backend comparisons.
222
+
223
+ ### Minimal production path (copy-paste)
224
+
225
+ ```bash
226
+ pip install -e ".[dev,ml]"
227
+ python scripts/rag_baseline.py --output-dir artifacts --k 3
228
+ python scripts/rag_real_corpus_eval.py --embeddings ... --query-embeddings ... --ids ... --ground-truth ... --output artifacts/real_corpus_runs/run_1.json --backend bruteforce --k 10 --ks 1,5,10 --loops 5
229
+ python scripts/stability_runs.py --embeddings ... --query-embeddings ... --ids ... --ground-truth ... --backend bruteforce --run-count 200 --output-dir artifacts/testing_runs
230
+ python benchmarks/compare_bruteforce_vs_faiss.py --mode exact --min-flat-overlap 0.99 --output artifacts/faiss_equivalence/run_1.json
231
+ python scripts/benchmark_matrix.py --mode exact --warmup 2 --loops 8 --seed 7 --min-flat-overlap 0.99 --output-dir artifacts/benchmark_matrix
232
+ python scripts/publishable_results.py --matrix-summary artifacts/benchmark_matrix/matrix_summary.json --stability-summary artifacts/testing_runs/stability_summary_bruteforce_200.json --output artifacts/benchmark_matrix/publishable_results.v1.json
233
+ ```
234
+
235
+ Expected artifacts:
236
+
237
+ - `artifacts/rag_baseline_metrics.v1.json`
238
+ - `artifacts/real_corpus_runs/run_*.json`
239
+ - `artifacts/testing_runs/stability_summary_*.json`
240
+ - `artifacts/faiss_equivalence/run_*.json`
241
+ - `artifacts/benchmark_matrix/matrix_summary.json`
242
+ - `artifacts/benchmark_matrix/publishable_results.v1.json`
243
+
244
+ Further reading:
245
+
246
+ - `docs/integration_guides.md`
247
+ - `docs/reproducibility.md`
248
+ - `docs/kpi_charter.md`
249
+ - `docs/research_claims.md`
250
+ - `docs/credibility_audit.md`
251
+ - `docs/limitations.md`
252
+ - `docs/releases/v1.0.0.md`
253
+ - `docs/paper/reproducibility_appendix.md`
254
+
255
+ ## Publication and release bundle
256
+
257
+ Generate a release-bundle manifest that checks required docs/governance/evidence files:
258
+
259
+ ```bash
260
+ python scripts/build_release_bundle.py --output-dir artifacts/release_bundle
261
+ ```
262
+
263
+ ## Artifact policy (publish vs private)
264
+
265
+ - Safe to publish:
266
+ - benchmark result summaries
267
+ - stability aggregate summaries
268
+ - synthetic/mock input examples
269
+ - Keep private:
270
+ - real corpus raw embeddings
271
+ - query embeddings derived from private data
272
+ - sensitive metadata and ID mappings
273
+ - Recommended:
274
+ - commit docs + summary metrics in repo
275
+ - keep private input blobs in external storage
276
+
277
+ ## Project adoption checklist
278
+
279
+ - Install: `pip install -e ".[dev,ml]"` and optional `.[faiss]`.
280
+ - Validation: run `pytest -q`.
281
+ - Quality baseline: run `python scripts/rag_baseline.py`.
282
+ - Real corpus eval: run `python scripts/rag_real_corpus_eval.py --embeddings ... --query-embeddings ... --ids ... --ground-truth ... --threshold-recall 0.75 --threshold-ndcg 0.70 --threshold-p95-ms 120`.
283
+ - Persistence: verify `VectorIndex.save/load` on your own embeddings snapshot.
284
+ - Performance: run benchmark script with your target `n`, `d`, `nq`, `k`.
285
+ - Integration: run `python examples/minimal_rag_integration.py`.
286
+
287
+ ## Feature snapshot
288
+
289
+ - `kmeans` returns rich outputs (`labels`, `centers`, `inertia`, `n_iter`) with deterministic validation.
290
+ - Hard-negative mining supports `top1`, `topk_sample`, and `distance_band`, plus `exclude_ids` / `exclude_mask`.
291
+ - Retrieval evaluation includes `retrieval_report_detailed(include_per_query=...)` and `batch_metrics_summary(include_std=True)`.
292
+ - Public demo bootstrap is available under `demo_repo_template/`.
293
+
294
+ ## v1.0 readiness gates
295
+
296
+ - Benchmark matrix artifacts produced with fixed protocol and environment metadata.
297
+ - Stability harness demonstrates repeatability for latency/QPS/quality summaries.
298
+ - API stability contract documented in `docs/api.md` and enforced in `tests/test_api_stability.py`.
299
+ - Release packaging includes reproducible command blocks and artifact policy.
300
+
301
+ ## Governance and trust
302
+
303
+ - `LICENSE`
304
+ - `CITATION.cff`
305
+
306
+ ## Error cases
307
+
308
+ Stable error prefixes are used for fast debugging:
309
+
310
+ - `vector_array_error`: malformed array, IDs, metadata, subset lookup
311
+ - `metric_error`: unsupported or invalid metric definitions
312
+ - `index_error`: index lifecycle/search/add/persistence consistency issues
313
+ - `manifest_error`: missing/unsupported manifest fields or version
314
+
315
+ ## Troubleshooting
316
+
317
+ - **Faiss not available**
318
+ - Install with `pip install -e ".[faiss]"`.
319
+ - **Dimension mismatch at search/add**
320
+ - Ensure both base vectors and query vectors use the same embedding dimension.
321
+ - **Metric confusion**
322
+ - For cosine similarity, pass normalized vectors or set `normalize=True`.
323
+ - **Persistence load failure**
324
+ - Check manifest version compatibility and whether artifacts were modified after save.