brainbank 0.6.0 → 0.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -20,29 +20,16 @@ BrainBank gives LLMs a long-term memory that persists between sessions.
20
20
 
21
21
  ## Why BrainBank?
22
22
 
23
- Built for a multi-repo codebase that needed unified AI context. Zero infrastructure, zero ongoing cost.
23
+ BrainBank is a **code-aware knowledge engine** not just a memory layer. It parses your codebase with tree-sitter ASTs, indexes git history and co-edit patterns, and makes everything searchable with hybrid vector + keyword retrieval. Optional packages add conversational memory (`@brainbank/memory`) and MCP integration (`@brainbank/mcp`).
24
24
 
25
- Most AI memory solutions (mem0, Zep, LangMem) require cloud services, external databases, or LLM calls just to store a memory. BrainBank takes a different approach:
26
-
27
- | | **BrainBank** | **mem0** | **Zep** | **LangMem** |
25
+ | | **BrainBank** | **QMD** | **mem0 / Zep** | **LangChain** |
28
26
  |---|:---:|:---:|:---:|:---:|
29
- | Infrastructure | **SQLite file** | Vector DB + cloud | Neo4j + cloud | LangGraph Platform |
30
- | LLM required to write | **No**¹ | Yes | Yes | Yes |
31
- | Code-aware | **19 AST-parsed languages (tree-sitter), git, co-edits** | | | |
32
- | Custom plugins | **`.use()` plugin system** | | | |
33
- | Search | **Vector + BM25 + RRF** | Vector + graph² | Vector + BM25 + graph | Vector only |
34
- | Framework lock-in | **None** | Optional | Zep cloud | LangChain |
35
- | Portable | **Copy one file** | Tied to DB | Tied to cloud | Tied to platform |
36
-
37
- > ¹ mem0 and Zep use LLMs to auto-extract memories from raw text. BrainBank is explicit — you decide what gets stored. Less magic, more control.
38
- >
39
- > ² mem0's graph store (mem0g) is available in the paid platform version.
40
-
41
- **In short:**
42
- - **Code-first** — the only memory layer that understands code structure, git history, and file co-edit relationships
43
- - **Framework-agnostic** — plain TypeScript, works with any agent framework (LangChain, Vercel AI SDK, custom) or none at all. Unopinionated — doesn't force you into a specific pattern
44
- - **$0 memory bill** — no LLM calls to extract/consolidate. You store what you want, BrainBank embeds deterministically
45
- - **Truly portable** — `.brainbank/brainbank.db` is a normal file. Copy it, back it up, `git lfs` it
27
+ | Code-aware (AST) | **19 languages** (tree-sitter) | | | |
28
+ | Git + co-edits | | | | |
29
+ | Search | **Vector + BM25 + RRF** | Vector + reranker | Vector + graph | Vector only |
30
+ | Infra | **SQLite file** | Local GGUF | Vector DB + cloud | Vector DB |
31
+ | Plugins | **`.use()` builder** | | | |
32
+ | Memory | **`@brainbank/memory`** (opt-in) | | **Core feature** | |
46
33
 
47
34
  ### Table of Contents
48
35
 
@@ -61,6 +48,7 @@ Most AI memory solutions (mem0, Zep, LangMem) require cloud services, external d
61
48
  - [Examples](#examples)
62
49
  - [Watch Mode](#watch-mode)
63
50
  - [MCP Server](#mcp-server)
51
+ - [Project Config](#project-config)
64
52
  - [Configuration](#configuration)
65
53
  - [Embedding Providers](#embedding-providers)
66
54
  - [Reranker](#reranker)
@@ -74,7 +62,8 @@ Most AI memory solutions (mem0, Zep, LangMem) require cloud services, external d
74
62
  - [Benchmarks](#benchmarks)
75
63
  - [Search Quality: AST vs Sliding Window](#search-quality-ast-vs-sliding-window)
76
64
  - [Grammar Support](#grammar-support)
77
- - [RAG Retrieval Quality](#rag-retrieval-quality) · [Full Results →](./BENCHMARKS.md)
65
+ - [RAG Retrieval Quality](#rag-retrieval-quality)
66
+ · [Full Results →](./BENCHMARKS.md)
78
67
 
79
68
  ---
80
69
 
@@ -104,17 +93,17 @@ npm install @brainbank/mcp
104
93
 
105
94
  ### Tree-Sitter Grammars
106
95
 
107
- BrainBank uses [tree-sitter](https://tree-sitter.github.io/) for AST-aware code chunking. **JavaScript and TypeScript grammars are included by default.** Other languages require installing the corresponding grammar package:
96
+ BrainBank uses [tree-sitter](https://tree-sitter.github.io/) for AST-aware code chunking. **JavaScript, TypeScript, and Python grammars are included by default.** Other languages require installing the corresponding grammar package:
108
97
 
109
98
  ```bash
110
99
  # Install only the grammars you need
111
- npm install tree-sitter-python tree-sitter-go tree-sitter-rust
100
+ npm install tree-sitter-go tree-sitter-rust tree-sitter-ruby
112
101
  ```
113
102
 
114
103
  If you index a file whose grammar isn't installed, BrainBank will throw a clear error:
115
104
 
116
105
  ```
117
- BrainBank: Grammar 'tree-sitter-python' is not installed. Run: npm install tree-sitter-python
106
+ BrainBank: Grammar 'tree-sitter-go' is not installed. Run: npm install tree-sitter-go
118
107
  ```
119
108
 
120
109
  <details>
@@ -122,11 +111,11 @@ BrainBank: Grammar 'tree-sitter-python' is not installed. Run: npm install tree-
122
111
 
123
112
  | Category | Packages |
124
113
  |----------|----------|
125
- | **Included** | `tree-sitter-javascript`, `tree-sitter-typescript` |
114
+ | **Included** | `tree-sitter-javascript`, `tree-sitter-typescript`, `tree-sitter-python` |
126
115
  | Web | `tree-sitter-html`, `tree-sitter-css` |
127
116
  | Systems | `tree-sitter-go`, `tree-sitter-rust`, `tree-sitter-c`, `tree-sitter-cpp`, `tree-sitter-swift` |
128
117
  | JVM | `tree-sitter-java`, `tree-sitter-kotlin`, `tree-sitter-scala` |
129
- | Scripting | `tree-sitter-python`, `tree-sitter-ruby`, `tree-sitter-php`, `tree-sitter-lua`, `tree-sitter-bash`, `tree-sitter-elixir` |
118
+ | Scripting | `tree-sitter-ruby`, `tree-sitter-php`, `tree-sitter-lua`, `tree-sitter-bash`, `tree-sitter-elixir` |
130
119
  | .NET | `tree-sitter-c-sharp` |
131
120
 
132
121
  </details>
@@ -275,23 +264,33 @@ BrainBank uses pluggable plugins. Register only what you need with `.use()`:
275
264
  | `docs` | `brainbank/docs` | Document collections (markdown, wikis) |
276
265
 
277
266
  ```typescript
278
- import { BrainBank } from 'brainbank';
267
+ import { BrainBank, OpenAIEmbedding } from 'brainbank';
279
268
  import { code } from 'brainbank/code';
280
269
  import { git } from 'brainbank/git';
281
270
  import { docs } from 'brainbank/docs';
282
271
 
283
- // Pick only the plugins you need
284
- const brain = new BrainBank({ repoPath: '.' })
285
- .use(code())
286
- .use(git())
287
- .use(docs());
272
+ // Each plugin can use a different embedding provider
273
+ const brain = new BrainBank({ repoPath: '.' }) // default: local WASM (384d, free)
274
+ .use(code({ embeddingProvider: new OpenAIEmbedding() })) // code: OpenAI (1536d)
275
+ .use(git()) // git: local (384d)
276
+ .use(docs()); // docs: local (384d)
288
277
 
289
278
  // Index code + git (incremental — only processes changes)
290
279
  await brain.index();
291
280
 
292
- // Index document collections
281
+ // Register and index document collections
293
282
  await brain.addCollection({ name: 'wiki', path: '~/docs', pattern: '**/*.md' });
294
283
  await brain.indexDocs();
284
+
285
+ // Dynamic collections — store anything
286
+ const decisions = brain.collection('decisions');
287
+ await decisions.add(
288
+ 'Use SQLite with WAL mode instead of PostgreSQL. Portable, zero infra.',
289
+ { tags: ['architecture'] }
290
+ );
291
+ const hits = await decisions.search('why not postgres');
292
+
293
+ brain.close();
295
294
  ```
296
295
 
297
296
  ### Collections
@@ -509,32 +508,72 @@ brainbank stats # shows all plugins
509
508
  brainbank kv search slack_messages "deploy" # search slack data
510
509
  ```
511
510
 
512
- #### Advanced: config file
511
+ ---
513
512
 
514
- For fine-grained control, create a `.brainbank/config.ts`:
513
+ ## Project Config
515
514
 
516
- ```typescript
517
- // .brainbank/config.ts
518
- export default {
519
- builtins: ['code', 'docs'], // exclude git (default: all three)
520
- brainbank: { // BrainBank constructor options
521
- dbPath: '.brainbank/brain.db',
515
+ Drop a `.brainbank/config.json` in your repo root. Every `brainbank index` reads it automatically — no CLI flags needed.
516
+
517
+ ```jsonc
518
+ // .brainbank/config.json
519
+ {
520
+ // Which built-in plugins to load (default: all three)
521
+ "plugins": ["code", "git", "docs"],
522
+
523
+ // Per-plugin options
524
+ "code": {
525
+ "embedding": "openai", // use OpenAI embeddings for code
526
+ "maxFileSize": 512000
522
527
  },
523
- };
528
+ "git": {
529
+ "depth": 200 // index last 200 commits
530
+ },
531
+ "docs": {
532
+ "embedding": "perplexity-context",
533
+ "collections": [
534
+ { "name": "docs", "path": "./docs", "pattern": "**/*.md" },
535
+ { "name": "wiki", "path": "~/team-wiki", "pattern": "**/*.md", "ignore": ["drafts/**"] }
536
+ ]
537
+ },
538
+
539
+ // Global defaults
540
+ "embedding": "local", // default for plugins without their own
541
+ "reranker": "qwen3"
542
+ }
524
543
  ```
525
544
 
526
- Everything lives in `.brainbank/` — DB, config, and custom plugins:
545
+ **Embedding keys:** `"local"` (default, free), `"openai"`, `"perplexity"`, `"perplexity-context"`.
546
+
547
+ **Per-plugin embeddings** — each plugin creates its own HNSW index with the correct dimensions. A plugin without an `embedding` key uses the global default.
548
+
549
+ **Docs collections** — registered automatically on every `brainbank index` run. No need for `--docs` flags.
550
+
551
+ **Custom plugins** — auto-discovered from `.brainbank/indexers/`:
527
552
 
528
553
  ```
529
554
  .brainbank/
530
555
  ├── brainbank.db # SQLite database (auto-created)
531
- ├── config.ts # Optional project config
532
- └── indexers/ # Optional custom plugin files
556
+ ├── config.json # Project config (optional)
557
+ └── indexers/ # Custom plugin files (optional)
533
558
  ├── slack.ts
534
559
  └── jira.ts
535
560
  ```
536
561
 
537
- No folder and no config file? The CLI uses the built-in plugins (`code`, `git`, `docs`).
562
+ Custom plugins can also have their own config section:
563
+
564
+ ```jsonc
565
+ {
566
+ "plugins": ["code", "git"],
567
+ "slack": { "embedding": "openai" }, // matched by plugin name
568
+ "jira": { "embedding": "perplexity" }
569
+ }
570
+ ```
571
+
572
+ **Config priority:** CLI flags > `config.json` > auto-resolve from DB > defaults.
573
+
574
+ > `.brainbank/config.ts` (or `.js`, `.mjs`) is still supported for programmatic config with custom plugin instances. JSON is preferred for declarative setups.
575
+
576
+ No config file? The CLI uses all built-in plugins with local embeddings — zero config required.
538
577
 
539
578
  ---
540
579
 
@@ -708,6 +747,50 @@ const brain = new BrainBank({
708
747
  | **Perplexity** | `PerplexityEmbedding` | 2560 (4b) / 1024 (0.6b) | ~100ms | $0.02/1M tokens |
709
748
  | **Perplexity Context** | `PerplexityContextEmbedding` | 2560 (4b) / 1024 (0.6b) | ~100ms | $0.06/1M tokens |
710
749
 
750
+ #### How It Works
751
+
752
+ BrainBank **auto-resolves** the embedding provider. Set it once → it's stored in the DB → every future run uses the same provider automatically.
753
+
754
+ **Programmatic API** — pass `embeddingProvider` to the constructor:
755
+
756
+ ```typescript
757
+ import { BrainBank, OpenAIEmbedding } from 'brainbank';
758
+
759
+ const brain = new BrainBank({
760
+ repoPath: '.',
761
+ embeddingProvider: new OpenAIEmbedding(), // stored in DB on first index
762
+ });
763
+ ```
764
+
765
+ **CLI** — use the `--embedding` flag on first index:
766
+
767
+ ```bash
768
+ brainbank index . --embedding openai # stores provider_key=openai in DB
769
+ brainbank index . # auto-resolves openai from DB
770
+ brainbank hsearch "auth middleware" # uses the same provider
771
+ ```
772
+
773
+ **MCP** — zero-config. Reads the provider from the DB automatically.
774
+
775
+ > The provider key is persisted in the `embedding_meta` table. Priority on startup: explicit `embeddingProvider` in config > stored `provider_key` in DB > local WASM (default).
776
+
777
+ **Per-plugin override** — each plugin can use a different embedding provider:
778
+
779
+ ```typescript
780
+ import { BrainBank, OpenAIEmbedding } from 'brainbank';
781
+ import { PerplexityContextEmbedding } from 'brainbank';
782
+ import { code } from 'brainbank/code';
783
+ import { git } from 'brainbank/git';
784
+ import { docs } from 'brainbank/docs';
785
+
786
+ const brain = new BrainBank({ repoPath: '.' }) // default: local WASM (384d)
787
+ .use(code({ embeddingProvider: new OpenAIEmbedding() })) // code: OpenAI (1536d)
788
+ .use(git()) // git: local (384d)
789
+ .use(docs({ embeddingProvider: new PerplexityContextEmbedding() })); // docs: Perplexity (2560d)
790
+ ```
791
+
792
+ > Each plugin creates its own HNSW index with the correct dimensions. The global `embeddingProvider` (or local default) is used for any plugin that doesn't specify one.
793
+
711
794
  #### OpenAI
712
795
 
713
796
  ```typescript
@@ -1104,7 +1187,7 @@ BrainBank uses **native tree-sitter** to parse source code into ASTs and extract
1104
1187
 
1105
1188
  For large classes (>80 lines), the chunker descends into the class body and extracts each method as a separate chunk. For unsupported languages, it falls back to a sliding window with overlap.
1106
1189
 
1107
- > Tree-sitter grammars are **optional dependencies**. If a grammar isn't installed, that language falls back to the generic sliding window. Install only the grammars you need: `npm install tree-sitter-ruby tree-sitter-go` etc.
1190
+ > Tree-sitter grammars are **optional dependencies** (except JS and TS, which are included). If you index a file whose grammar isn't installed, BrainBank throws a clear error with the exact `npm install` command. See [Tree-Sitter Grammars](#tree-sitter-grammars) for the full list.
1108
1191
 
1109
1192
  ### Incremental Indexing
1110
1193
 
@@ -1270,19 +1353,7 @@ BrainBank's hybrid search pipeline (Vector + BM25 → RRF) with Perplexity Conte
1270
1353
 
1271
1354
  The hybrid pipeline improved R@5 by **+26pp over vector-only** retrieval on our custom eval.
1272
1355
 
1273
- #### BrainBank vs QMD (Head-to-Head)
1274
-
1275
- Compared against [QMD](https://github.com/tobi/qmd), a local-first search engine using GGUF models (embeddinggemma-300M + query expansion + reranker) — same corpus, same 20 queries:
1276
-
1277
- | Metric | BrainBank + Reranker | QMD + Reranker |
1278
- |---|:---:|:---:|
1279
- | **R@5** | **83%** | 65% |
1280
- | **MRR** | **0.57** | 0.45 |
1281
- | **Misses** | **1/20** | 6/20 |
1282
-
1283
- > BrainBank wins by +18pp R@5. QMD is competitive on semantic queries (81% vs 94%) and ties on broad queries (83% vs 83%) — impressive for a fully local pipeline with zero API calls.
1284
-
1285
- See **[BENCHMARKS.md](./BENCHMARKS.md)** for full pipeline progression, per-technique impact, QMD comparison details, and reproduction instructions.
1356
+ See **[BENCHMARKS.md](./BENCHMARKS.md)** for full pipeline progression, per-technique impact, and reproduction instructions.
1286
1357
 
1287
1358
  #### Running the RAG Eval
1288
1359
 
@@ -542,12 +542,12 @@ interface PluginContext {
542
542
  embedding: EmbeddingProvider;
543
543
  /** Resolved BrainBank config. */
544
544
  config: ResolvedConfig;
545
- /** Create and initialize an HNSW index. */
546
- createHnsw(maxElements?: number): Promise<HNSWIndex>;
545
+ /** Create and initialize an HNSW index. Optionally override dims for per-plugin embeddings. */
546
+ createHnsw(maxElements?: number, dims?: number): Promise<HNSWIndex>;
547
547
  /** Load existing vectors from a SQLite vectors table into an HNSW index + cache. */
548
548
  loadVectors(table: string, idCol: string, hnsw: HNSWIndex, cache: Map<number, Float32Array>): void;
549
- /** Get or create a shared HNSW index by type (e.g. 'code', 'git'). For multi-repo support. */
550
- getOrCreateSharedHnsw(type: string, maxElements?: number): Promise<{
549
+ /** Get or create a shared HNSW index by type (e.g. 'code', 'git'). Optionally override dims for per-plugin embeddings. */
550
+ getOrCreateSharedHnsw(type: string, maxElements?: number, dims?: number): Promise<{
551
551
  hnsw: HNSWIndex;
552
552
  vecCache: Map<number, Float32Array>;
553
553
  isNew: boolean;
@@ -6,10 +6,10 @@ import {
6
6
  isIgnoredDir,
7
7
  isIgnoredFile,
8
8
  isSupported
9
- } from "./chunk-PXK62M5W.js";
9
+ } from "./chunk-FGL32LUJ.js";
10
10
  import {
11
11
  rerank
12
- } from "./chunk-C4KDZGRX.js";
12
+ } from "./chunk-VQ27YUHH.js";
13
13
  import {
14
14
  normalizeBM25,
15
15
  reciprocalRankFusion,
@@ -1443,8 +1443,8 @@ var Initializer = class {
1443
1443
  db,
1444
1444
  embedding,
1445
1445
  config,
1446
- createHnsw: /* @__PURE__ */ __name((maxElements) => new HNSWIndex(
1447
- config.embeddingDims,
1446
+ createHnsw: /* @__PURE__ */ __name((maxElements, dims) => new HNSWIndex(
1447
+ dims ?? config.embeddingDims,
1448
1448
  maxElements ?? config.maxElements,
1449
1449
  config.hnswM,
1450
1450
  config.hnswEfConstruction,
@@ -1461,11 +1461,12 @@ var Initializer = class {
1461
1461
  loadVectors(db, table, idCol, hnsw, cache);
1462
1462
  }
1463
1463
  }, "loadVectors"),
1464
- getOrCreateSharedHnsw: /* @__PURE__ */ __name(async (type, maxElements) => {
1464
+ getOrCreateSharedHnsw: /* @__PURE__ */ __name(async (type, maxElements, dims) => {
1465
1465
  const existing = sharedHnsw.get(type);
1466
1466
  if (existing) return { ...existing, isNew: false };
1467
+ const hnswDims = dims ?? config.embeddingDims;
1467
1468
  const hnsw = await new HNSWIndex(
1468
- config.embeddingDims,
1469
+ hnswDims,
1469
1470
  maxElements ?? config.maxElements,
1470
1471
  config.hnswM,
1471
1472
  config.hnswEfConstruction,
@@ -2428,4 +2429,4 @@ export {
2428
2429
  ContextBuilder,
2429
2430
  BrainBank
2430
2431
  };
2431
- //# sourceMappingURL=chunk-YC4ZQLDN.js.map
2432
+ //# sourceMappingURL=chunk-DI3H6JVZ.js.map