@1mbrain/benchmarks 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (69) hide show
  1. package/README.md +85 -0
  2. package/fixtures/1mbrain-focused-mini/1mbrain-focused-mini.json +928 -0
  3. package/fixtures/1mbrain-focused-mini/README.md +45 -0
  4. package/fixtures/adversarial-memory/dataset_claude_adversarial.json +3333 -0
  5. package/fixtures/adversarial-memory/dataset_gemini_adversarial_memory.json +2984 -0
  6. package/fixtures/balanced-mini/dataset_claude_balanced_mini.json +2077 -0
  7. package/fixtures/balanced-mini/dataset_gemini_balanced_mini.json +1995 -0
  8. package/fixtures/generate_datasets.js +1741 -0
  9. package/fixtures/graph-stress-hard/README.md +43 -0
  10. package/fixtures/graph-stress-hard/dataset_graph_stress_hard.json +4374 -0
  11. package/fixtures/graph-stress-hard/generate_graph_stress_hard.js +526 -0
  12. package/fixtures/realistic-medium/dataset_claude_realistic_medium.json +7462 -0
  13. package/fixtures/realistic-medium/dataset_gemini_realistic_medium.json +7277 -0
  14. package/fixtures/realistic-medium/gen_claude_medium.js +600 -0
  15. package/package.json +22 -0
  16. package/reports/benchmark_report.md +48 -0
  17. package/reports/benchmark_report_claude_adversarial.md +42 -0
  18. package/reports/benchmark_report_claude_adversarial_adaptive.md +42 -0
  19. package/reports/benchmark_report_claude_adversarial_adaptive2_fast.md +42 -0
  20. package/reports/benchmark_report_claude_adversarial_adaptive_fast.md +42 -0
  21. package/reports/benchmark_report_claude_adversarial_rerank.md +42 -0
  22. package/reports/benchmark_report_claude_balanced_mini.md +42 -0
  23. package/reports/benchmark_report_claude_balanced_mini_adaptive.md +42 -0
  24. package/reports/benchmark_report_claude_balanced_mini_adaptive2_fast.md +42 -0
  25. package/reports/benchmark_report_claude_balanced_mini_adaptive_fast.md +42 -0
  26. package/reports/benchmark_report_claude_balanced_mini_rerank.md +42 -0
  27. package/reports/benchmark_report_claude_realistic_medium.md +42 -0
  28. package/reports/benchmark_report_claude_realistic_medium_adaptive.md +42 -0
  29. package/reports/benchmark_report_claude_realistic_medium_adaptive2_fast.md +42 -0
  30. package/reports/benchmark_report_claude_realistic_medium_adaptive_fast.md +42 -0
  31. package/reports/benchmark_report_claude_realistic_medium_evidence_rerank_local.md +42 -0
  32. package/reports/benchmark_report_claude_realistic_medium_openai_evidence_rerank.md +41 -0
  33. package/reports/benchmark_report_claude_realistic_medium_openai_multi_signal.md +41 -0
  34. package/reports/benchmark_report_claude_realistic_medium_openai_multi_signal_scoped.md +41 -0
  35. package/reports/benchmark_report_claude_realistic_medium_openai_phase8_no_judge.md +42 -0
  36. package/reports/benchmark_report_claude_realistic_medium_openai_rankingpolicy.md +41 -0
  37. package/reports/benchmark_report_claude_realistic_medium_openai_stale_filter.md +41 -0
  38. package/reports/benchmark_report_claude_realistic_medium_openai_stale_filter_absence_fix.md +41 -0
  39. package/reports/benchmark_report_claude_realistic_medium_openai_write_time_invalidation.md +41 -0
  40. package/reports/benchmark_report_claude_realistic_medium_rerank.md +42 -0
  41. package/reports/benchmark_report_claude_realistic_medium_stale_filter_local.md +42 -0
  42. package/reports/benchmark_report_graph_stress_hard.md +42 -0
  43. package/reports/benchmark_report_graph_stress_hard_absence_fix.md +42 -0
  44. package/reports/benchmark_report_graph_stress_hard_adaptive.md +42 -0
  45. package/reports/benchmark_report_graph_stress_hard_evidence_rerank.md +42 -0
  46. package/reports/benchmark_report_graph_stress_hard_multi_signal_current_guardrail.md +42 -0
  47. package/reports/benchmark_report_graph_stress_hard_multi_signal_guardrail_fixed.md +42 -0
  48. package/reports/benchmark_report_graph_stress_hard_multi_signal_local.md +42 -0
  49. package/reports/benchmark_report_graph_stress_hard_multi_signal_scoped_guardrail.md +42 -0
  50. package/reports/benchmark_report_graph_stress_hard_multi_signal_vector_pure_guardrail.md +42 -0
  51. package/reports/benchmark_report_graph_stress_hard_phase8_sdk_guardrail.md +42 -0
  52. package/reports/benchmark_report_graph_stress_hard_rerank.md +42 -0
  53. package/reports/benchmark_report_graph_stress_hard_stale_filter.md +42 -0
  54. package/reports/benchmark_report_graph_stress_hard_write_time_invalidation.md +42 -0
  55. package/results/.gitignore +2 -0
  56. package/src/adapters/1mbrain.ts +317 -0
  57. package/src/adapters/keyword-embedding.ts +48 -0
  58. package/src/adapters/mem0.ts +124 -0
  59. package/src/adapters/qdrant.ts +214 -0
  60. package/src/adapters/unavailable.ts +49 -0
  61. package/src/adapters/vector-baseline.ts +149 -0
  62. package/src/datasets/focused-mini.ts +158 -0
  63. package/src/datasets/synthetic-agent-memory.ts +532 -0
  64. package/src/llm-evaluator.ts +262 -0
  65. package/src/metrics.ts +482 -0
  66. package/src/provider.ts +151 -0
  67. package/src/runner.ts +635 -0
  68. package/tsconfig.json +10 -0
  69. package/tsconfig.tsbuildinfo +1 -0
package/README.md ADDED
@@ -0,0 +1,85 @@
1
+ # 1MBrain Provider Benchmarks
2
+
3
+ Provider-level benchmarks compare memory engines without involving an LLM agent.
4
+ Hermes or other agents should be tested later as an end-to-end validation layer.
5
+
6
+ ## Run
7
+
8
+ ```bash
9
+ npm run build --workspace=packages/core
10
+ npm run bench --workspace=packages/benchmarks
11
+ ```
12
+
13
+ ## Focused Mini Dataset
14
+
15
+ A small GitHub-friendly fixture is available at
16
+ `packages/benchmarks/fixtures/1mbrain-focused-mini/1mbrain-focused-mini.json`.
17
+ It contains 5 directed conversations, 41 memory records, and 23 questions that
18
+ target 1MBrain-specific behavior without requiring an LLM judge.
19
+
20
+ Optional dataset scales:
21
+
22
+ ```bash
23
+ BENCH_SCALES=1,10,100 npm run bench --workspace=packages/benchmarks
24
+ ```
25
+
26
+ Optional provider filter:
27
+
28
+ ```bash
29
+ BENCH_PROVIDERS=1mbrain-sqlite-vector-bulk,1mbrain-sqlite-graph-bulk,qdrant-vector npm run bench --workspace=packages/benchmarks
30
+ ```
31
+
32
+ Each scale adds 50 noise memories plus the fixed ground-truth memories.
33
+
34
+ ## Current Providers
35
+
36
+ - `1mbrain-sqlite-vector`: SQLite storage with vector-only recall.
37
+ - `1mbrain-sqlite-graph`: SQLite storage with spreading activation enabled.
38
+ - `1mbrain-sqlite-vector-bulk`: SQLite vector recall after direct bulk load, bypassing
39
+ `remember()` auto-association.
40
+ - `1mbrain-sqlite-graph-bulk`: SQLite graph recall after direct bulk load plus explicit
41
+ dataset associations only.
42
+ - `qdrant-vector`: Qdrant vector recall, enabled only when `QDRANT_URL` is set.
43
+
44
+ ## Qdrant Local Benchmark
45
+
46
+ Start Qdrant with Docker Compose:
47
+
48
+ ```bash
49
+ docker compose --profile qdrant up -d qdrant
50
+ ```
51
+
52
+ Then run:
53
+
54
+ ```bash
55
+ QDRANT_URL=http://localhost:6333 npm run bench --workspace=packages/benchmarks
56
+ ```
57
+
58
+ Optional:
59
+
60
+ ```bash
61
+ QDRANT_COLLECTION=one_million_brain_bench
62
+ QDRANT_API_KEY=...
63
+ ```
64
+
65
+ If `QDRANT_URL` is not set or unreachable, the Qdrant provider is skipped and local
66
+ 1MBrain providers still run.
67
+
68
+ ## Metrics
69
+
70
+ - `recallAt5`, `recallAt10`: fraction of expected memories retrieved in top K.
71
+ - `mrrAt10`: reciprocal rank of the first correct hit.
72
+ - `ndcgAt10`: rank-sensitive quality score for all expected hits.
73
+ - `p50Ms`, `p95Ms`: recall latency percentiles.
74
+ - `setupMs`: dataset load and association creation time.
75
+
76
+ Results are written to `packages/benchmarks/results/*.json` and `*.csv`.
77
+
78
+ ## Interpreting Setup Modes
79
+
80
+ The non-bulk 1MBrain providers use `MemoryEngine.remember()`, which includes embedding,
81
+ event emission, and auto-association. This is closest to normal runtime writes.
82
+
83
+ The `*-bulk` providers write directly through `DatabaseProvider.bulkCreateMemories()`
84
+ and `bulkCreateAssociations()`. This isolates retrieval/storage behavior and makes
85
+ the setup comparison fairer against vector databases that use batch upsert.