@1mbrain/benchmarks 0.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +85 -0
- package/fixtures/1mbrain-focused-mini/1mbrain-focused-mini.json +928 -0
- package/fixtures/1mbrain-focused-mini/README.md +45 -0
- package/fixtures/adversarial-memory/dataset_claude_adversarial.json +3333 -0
- package/fixtures/adversarial-memory/dataset_gemini_adversarial_memory.json +2984 -0
- package/fixtures/balanced-mini/dataset_claude_balanced_mini.json +2077 -0
- package/fixtures/balanced-mini/dataset_gemini_balanced_mini.json +1995 -0
- package/fixtures/generate_datasets.js +1741 -0
- package/fixtures/graph-stress-hard/README.md +43 -0
- package/fixtures/graph-stress-hard/dataset_graph_stress_hard.json +4374 -0
- package/fixtures/graph-stress-hard/generate_graph_stress_hard.js +526 -0
- package/fixtures/realistic-medium/dataset_claude_realistic_medium.json +7462 -0
- package/fixtures/realistic-medium/dataset_gemini_realistic_medium.json +7277 -0
- package/fixtures/realistic-medium/gen_claude_medium.js +600 -0
- package/package.json +22 -0
- package/reports/benchmark_report.md +48 -0
- package/reports/benchmark_report_claude_adversarial.md +42 -0
- package/reports/benchmark_report_claude_adversarial_adaptive.md +42 -0
- package/reports/benchmark_report_claude_adversarial_adaptive2_fast.md +42 -0
- package/reports/benchmark_report_claude_adversarial_adaptive_fast.md +42 -0
- package/reports/benchmark_report_claude_adversarial_rerank.md +42 -0
- package/reports/benchmark_report_claude_balanced_mini.md +42 -0
- package/reports/benchmark_report_claude_balanced_mini_adaptive.md +42 -0
- package/reports/benchmark_report_claude_balanced_mini_adaptive2_fast.md +42 -0
- package/reports/benchmark_report_claude_balanced_mini_adaptive_fast.md +42 -0
- package/reports/benchmark_report_claude_balanced_mini_rerank.md +42 -0
- package/reports/benchmark_report_claude_realistic_medium.md +42 -0
- package/reports/benchmark_report_claude_realistic_medium_adaptive.md +42 -0
- package/reports/benchmark_report_claude_realistic_medium_adaptive2_fast.md +42 -0
- package/reports/benchmark_report_claude_realistic_medium_adaptive_fast.md +42 -0
- package/reports/benchmark_report_claude_realistic_medium_evidence_rerank_local.md +42 -0
- package/reports/benchmark_report_claude_realistic_medium_openai_evidence_rerank.md +41 -0
- package/reports/benchmark_report_claude_realistic_medium_openai_multi_signal.md +41 -0
- package/reports/benchmark_report_claude_realistic_medium_openai_multi_signal_scoped.md +41 -0
- package/reports/benchmark_report_claude_realistic_medium_openai_phase8_no_judge.md +42 -0
- package/reports/benchmark_report_claude_realistic_medium_openai_rankingpolicy.md +41 -0
- package/reports/benchmark_report_claude_realistic_medium_openai_stale_filter.md +41 -0
- package/reports/benchmark_report_claude_realistic_medium_openai_stale_filter_absence_fix.md +41 -0
- package/reports/benchmark_report_claude_realistic_medium_openai_write_time_invalidation.md +41 -0
- package/reports/benchmark_report_claude_realistic_medium_rerank.md +42 -0
- package/reports/benchmark_report_claude_realistic_medium_stale_filter_local.md +42 -0
- package/reports/benchmark_report_graph_stress_hard.md +42 -0
- package/reports/benchmark_report_graph_stress_hard_absence_fix.md +42 -0
- package/reports/benchmark_report_graph_stress_hard_adaptive.md +42 -0
- package/reports/benchmark_report_graph_stress_hard_evidence_rerank.md +42 -0
- package/reports/benchmark_report_graph_stress_hard_multi_signal_current_guardrail.md +42 -0
- package/reports/benchmark_report_graph_stress_hard_multi_signal_guardrail_fixed.md +42 -0
- package/reports/benchmark_report_graph_stress_hard_multi_signal_local.md +42 -0
- package/reports/benchmark_report_graph_stress_hard_multi_signal_scoped_guardrail.md +42 -0
- package/reports/benchmark_report_graph_stress_hard_multi_signal_vector_pure_guardrail.md +42 -0
- package/reports/benchmark_report_graph_stress_hard_phase8_sdk_guardrail.md +42 -0
- package/reports/benchmark_report_graph_stress_hard_rerank.md +42 -0
- package/reports/benchmark_report_graph_stress_hard_stale_filter.md +42 -0
- package/reports/benchmark_report_graph_stress_hard_write_time_invalidation.md +42 -0
- package/results/.gitignore +2 -0
- package/src/adapters/1mbrain.ts +317 -0
- package/src/adapters/keyword-embedding.ts +48 -0
- package/src/adapters/mem0.ts +124 -0
- package/src/adapters/qdrant.ts +214 -0
- package/src/adapters/unavailable.ts +49 -0
- package/src/adapters/vector-baseline.ts +149 -0
- package/src/datasets/focused-mini.ts +158 -0
- package/src/datasets/synthetic-agent-memory.ts +532 -0
- package/src/llm-evaluator.ts +262 -0
- package/src/metrics.ts +482 -0
- package/src/provider.ts +151 -0
- package/src/runner.ts +635 -0
- package/tsconfig.json +10 -0
- package/tsconfig.tsbuildinfo +1 -0
package/README.md
ADDED
|
@@ -0,0 +1,85 @@
|
|
|
1
|
+
# 1MBrain Provider Benchmarks
|
|
2
|
+
|
|
3
|
+
Provider-level benchmarks compare memory engines without involving an LLM agent.
|
|
4
|
+
Hermes or other agents should be tested later as an end-to-end validation layer.
|
|
5
|
+
|
|
6
|
+
## Run
|
|
7
|
+
|
|
8
|
+
```bash
|
|
9
|
+
npm run build --workspace=packages/core
|
|
10
|
+
npm run bench --workspace=packages/benchmarks
|
|
11
|
+
```
|
|
12
|
+
|
|
13
|
+
## Focused Mini Dataset
|
|
14
|
+
|
|
15
|
+
A small GitHub-friendly fixture is available at
|
|
16
|
+
`packages/benchmarks/fixtures/1mbrain-focused-mini/1mbrain-focused-mini.json`.
|
|
17
|
+
It contains 5 directed conversations, 41 memory records, and 23 questions that
|
|
18
|
+
target 1MBrain-specific behavior without requiring an LLM judge.
|
|
19
|
+
|
|
20
|
+
Optional dataset scales:
|
|
21
|
+
|
|
22
|
+
```bash
|
|
23
|
+
BENCH_SCALES=1,10,100 npm run bench --workspace=packages/benchmarks
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
Optional provider filter:
|
|
27
|
+
|
|
28
|
+
```bash
|
|
29
|
+
BENCH_PROVIDERS=1mbrain-sqlite-vector-bulk,1mbrain-sqlite-graph-bulk,qdrant-vector npm run bench --workspace=packages/benchmarks
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
Each scale adds 50 noise memories plus the fixed ground-truth memories.
|
|
33
|
+
|
|
34
|
+
## Current Providers
|
|
35
|
+
|
|
36
|
+
- `1mbrain-sqlite-vector`: SQLite storage with vector-only recall.
|
|
37
|
+
- `1mbrain-sqlite-graph`: SQLite storage with spreading activation enabled.
|
|
38
|
+
- `1mbrain-sqlite-vector-bulk`: SQLite vector recall after direct bulk load, bypassing
|
|
39
|
+
`remember()` auto-association.
|
|
40
|
+
- `1mbrain-sqlite-graph-bulk`: SQLite graph recall after direct bulk load plus explicit
|
|
41
|
+
dataset associations only.
|
|
42
|
+
- `qdrant-vector`: Qdrant vector recall, enabled only when `QDRANT_URL` is set.
|
|
43
|
+
|
|
44
|
+
## Qdrant Local Benchmark
|
|
45
|
+
|
|
46
|
+
Start Qdrant with Docker Compose:
|
|
47
|
+
|
|
48
|
+
```bash
|
|
49
|
+
docker compose --profile qdrant up -d qdrant
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
Then run:
|
|
53
|
+
|
|
54
|
+
```bash
|
|
55
|
+
QDRANT_URL=http://localhost:6333 npm run bench --workspace=packages/benchmarks
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
Optional:
|
|
59
|
+
|
|
60
|
+
```bash
|
|
61
|
+
QDRANT_COLLECTION=one_million_brain_bench
|
|
62
|
+
QDRANT_API_KEY=...
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
If `QDRANT_URL` is not set or unreachable, the Qdrant provider is skipped and local
|
|
66
|
+
1MBrain providers still run.
|
|
67
|
+
|
|
68
|
+
## Metrics
|
|
69
|
+
|
|
70
|
+
- `recallAt5`, `recallAt10`: fraction of expected memories retrieved in top K.
|
|
71
|
+
- `mrrAt10`: reciprocal rank of the first correct hit.
|
|
72
|
+
- `ndcgAt10`: rank-sensitive quality score for all expected hits.
|
|
73
|
+
- `p50Ms`, `p95Ms`: recall latency percentiles.
|
|
74
|
+
- `setupMs`: dataset load and association creation time.
|
|
75
|
+
|
|
76
|
+
Results are written to `packages/benchmarks/results/*.json` and `*.csv`.
|
|
77
|
+
|
|
78
|
+
## Interpreting Setup Modes
|
|
79
|
+
|
|
80
|
+
The non-bulk 1MBrain providers use `MemoryEngine.remember()`, which includes embedding,
|
|
81
|
+
event emission, and auto-association. This is closest to normal runtime writes.
|
|
82
|
+
|
|
83
|
+
The `*-bulk` providers write directly through `DatabaseProvider.bulkCreateMemories()`
|
|
84
|
+
and `bulkCreateAssociations()`. This isolates retrieval/storage behavior and makes
|
|
85
|
+
the setup comparison fairer against vector databases that use batch upsert.
|