npm - brainbank - Versions diffs - 0.1.0 → 0.1.3 - Mend

brainbank 0.1.0 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

package/README.md +123 -4
package/dist/chunk-RAEBYV75.js +709 -0
package/dist/chunk-RAEBYV75.js.map +1 -0
package/dist/{chunk-YGSEUWLV.js → chunk-TW5NTYYZ.js} +43 -30
package/dist/chunk-TW5NTYYZ.js.map +1 -0
package/dist/cli.js +32 -2
package/dist/cli.js.map +1 -1
package/dist/code.js +1 -1
package/dist/index.d.ts +43 -16
package/dist/index.js +2 -2
package/package.json +22 -2
package/dist/chunk-EDKSKLX4.js +0 -490
package/dist/chunk-EDKSKLX4.js.map +0 -1
package/dist/chunk-YGSEUWLV.js.map +0 -1

package/README.md CHANGED Viewed

@@ -27,7 +27,7 @@ Most AI memory solutions (mem0, Zep, LangMem) require cloud services, external d
 |---|:---:|:---:|:---:|:---:|
 | Infrastructure | **SQLite file** | Vector DB + cloud | Neo4j + cloud | LangGraph Platform |
 | LLM required to write | **No**¹ | Yes | Yes | Yes |
-| Code-aware | **30+ languages, git, co-edits** | ✗ | ✗ | ✗ |
+| Code-aware | **19 AST-parsed languages (tree-sitter), git, co-edits** | ✗ | ✗ | ✗ |
 | Custom indexers | **`.use()` plugin system** | ✗ | ✗ | ✗ |
 | Search | **Vector + BM25 + RRF** | Vector only | Vector + graph | Vector only |
 | Framework lock-in | **None** | Optional | Zep cloud | LangChain |
@@ -68,6 +68,9 @@ Most AI memory solutions (mem0, Zep, LangMem) require cloud services, external d
   - [Re-embedding](#re-embedding)
 - [Architecture](#architecture)
   - [Search Pipeline](#search-pipeline)
+- [Benchmarks](#benchmarks)
+  - [Search Quality: AST vs Sliding Window](#search-quality-ast-vs-sliding-window)
+  - [Grammar Support](#grammar-support)
 ---
@@ -148,12 +151,15 @@ BrainBank can be used entirely from the command line — no config file needed.
 ### Indexing
-`index` processes **code files + git history** only. Document collections are indexed separately with `docs`.
+`index` processes **code files + git history** by default. Use `--only` to select specific modules, and `--docs` to include document collections.
 ```bash
 brainbank index [path]                      # Index code + git history
 brainbank index [path] --force              # Force re-index everything
 brainbank index [path] --depth 200          # Limit git commit depth
+brainbank index [path] --only code          # Index only code (skip git)
+brainbank index [path] --only git           # Index only git history
+brainbank index [path] --docs ~/docs        # Include a docs folder
 brainbank docs [--collection <name>]        # Index document collections
 ```
@@ -232,7 +238,7 @@ BrainBank uses pluggable indexers. Register only what you need with `.use()`:
 | Indexer | Import | Description |
 |---------|--------|-------------|
-| `code` | `brainbank/code` | Language-aware code chunking (30+ languages) |
+| `code` | `brainbank/code` | AST-aware code chunking via tree-sitter (19 languages) |
 | `git` | `brainbank/git` | Git commit history, diffs, co-edit relationships |
 | `docs` | `brainbank/docs` | Document collections (markdown, wikis) |
@@ -899,6 +905,24 @@ Instances are cached in memory after first initialization, so subsequent queries
 ## Indexing
+### Code Chunking (tree-sitter)
+BrainBank uses **native tree-sitter** to parse source code into ASTs and extract semantic blocks — functions, classes, methods, interfaces — as individual chunks. This produces dramatically better embeddings than naive line-based splitting.
+**Supported languages (AST-parsed):**
+| Category | Languages |
+|----------|-----------|
+| Web | TypeScript, JavaScript, HTML, CSS |
+| Systems | Go, Rust, C, C++, Swift |
+| JVM | Java, Kotlin, Scala |
+| Scripting | Python, Ruby, PHP, Lua, Bash, Elixir |
+| .NET | C# |
+For large classes (>80 lines), the chunker descends into the class body and extracts each method as a separate chunk. For unsupported languages, it falls back to a sliding window with overlap.
+> Tree-sitter grammars are **optional dependencies**. If a grammar isn't installed, that language falls back to the generic sliding window. Install only the grammars you need: `npm install tree-sitter-ruby tree-sitter-go` etc.
 ### Incremental Indexing
 All indexing is **incremental by default** — only new or changed content is processed:
@@ -971,6 +995,101 @@ brainbank reembed
 ---
+## Benchmarks
+BrainBank includes benchmark scripts to validate chunking quality and search relevance. Run them against your own codebase to see the impact.
+### Search Quality: AST vs Sliding Window
+We compared BrainBank's **tree-sitter AST chunker** against the traditional **sliding window** (80-line blocks) on a production NestJS backend (3,753 lines across 8 service files). Both strategies chunk the same files; all chunks are embedded and searched with the same 10 domain-specific queries.
+#### How It Works
+```
+Sliding Window                          Tree-Sitter AST
+┌────────────────────┐                  ┌────────────────────┐
+│ import { ... }     │                  │ ✓ constructor()    │  → named chunk
+│ @Injectable()      │  → L1-80 block   │ ✓ findAll()        │  → named chunk
+│ class JobsService {│                  │ ✓ createJob()      │  → named chunk
+│   constructor()    │                  │ ✓ cancelJob()      │  → named chunk
+│   findAll() { ... }│                  │ ✓ updateStatus()   │  → named chunk
+│   createJob()      │                  └────────────────────┘
+│   ...              │
+│ ────────────────── │  overlaps ↕
+│   cancelJob()      │  → L75-155 block
+│   updateStatus()   │
+│   ...              │
+└────────────────────┘
+```
+**Sliding window** mixes imports, constructors, and multiple methods into one embedding. Search for "cancel a job" and you get a generic block.
+**AST chunking** gives each method its own embedding. Search for "cancel a job" → direct hit on `cancelJob()`.
+#### Results (Production NestJS Backend — 3,753 lines)
+Tested with 10 domain-specific queries on 8 service files (`orders.service.ts`, `bookings.service.ts`, `notifications.service.ts`, etc.):
+| Metric | Sliding Window | Tree-Sitter AST |
+|--------|:-:|:-:|
+| **Query Wins** | 0/10 | **8/10** (2 ties) |
+| **Top-1 Relevant** | 3/10 | **8/10** |
+| **Avg Precision@3** | 1.1/3 | **1.7/3** |
+| **Avg Score Delta** | — | **+0.035** |
+#### Per-Query Breakdown
+| Query | SW Top Result | AST Top Result | Δ Score |
+|-------|:---:|:---:|:---:|
+| cancel an order | generic `L451-458` | **`updateOrderStatus`** | +0.005 |
+| create a booking | generic `L451-458` | **`createInstantBooking`** | +0.068 |
+| confirm booking | generic `L451-458` | **`confirm`** | +0.034 |
+| send notification | generic `L226-305` | **`publishNotificationEvent`** | +0.034 |
+| authenticate JWT | generic `L1-80` | **`AuthModule`** | +0.032 |
+| tenant DB connection | `L76-155` | **`onModuleDestroy`** | +0.037 |
+| list orders paginated | `L76-155` | **`findAllActive`** | +0.045 |
+| reject booking | generic `L451-458` | **`reject`** | +0.090 |
+> Notice how the sliding window returns the **same generic block `L451-458`** for 4 different queries. The AST chunker returns a different, correctly named method each time.
+#### Chunk Quality Comparison
+| | Sliding Window | Tree-Sitter AST |
+|---|:-:|:-:|
+| Total chunks | 53 | **83** |
+| Avg lines/chunk | 75 | **39** |
+| Named chunks | 0 | **83** (100%) |
+| Chunk types | `block` | `method`, `interface`, `class` |
+### Grammar Support
+All 9 core grammars verified, each parsing in **<0.05ms**:
+| Language | AST Nodes Extracted | Parse Time |
+|----------|:---:|:---:|
+| TypeScript | `export_statement`, `interface_declaration` | 0.04ms |
+| JavaScript | `function_declaration` × 3 | 0.04ms |
+| Python | `class_definition`, `function_definition` × 2 | 0.03ms |
+| Go | `function_declaration`, `method_declaration` × 3 | 0.04ms |
+| Rust | `struct_item`, `impl_item`, `function_item` | 0.03ms |
+| Ruby | `class`, `method` | 0.03ms |
+| Java | `class_declaration` | 0.02ms |
+| C | `function_definition` × 3 | 0.05ms |
+| PHP | `class_declaration` | 0.03ms |
+> Additional grammars available: C++, Swift, C#, Kotlin, Scala, Lua, Elixir, Bash, HTML, CSS
+### Running Benchmarks
+```bash
+# Grammar support (9 languages, parse speed)
+node test/benchmarks/grammar-support.mjs
+# Search quality A/B (uses BrainBank's own source files)
+node test/benchmarks/search-quality.mjs
+```
+---
 ## Architecture
 <details>
@@ -1035,7 +1154,7 @@ Final results (sorted by blended score)
 ### Data Flow
-1. **Index** — Indexers parse files into chunks
+1. **Index** — Indexers parse files into chunks (tree-sitter AST for code, heading-based for docs)
 2. **Embed** — Each chunk gets a vector (local WASM or OpenAI)
 3. **Store** — Chunks + vectors → SQLite, vectors → HNSW index
 4. **Search** — Query → HNSW k-NN + BM25 keyword → RRF fusion → optional reranker