@theglitchking/semantic-pages 0.4.3 → 0.4.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +25 -11
- package/dist/mcp/server.js +9 -3
- package/dist/mcp/server.js.map +1 -1
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -6,7 +6,7 @@
|
|
|
6
6
|
[](https://opensource.org/licenses/MIT)
|
|
7
7
|
|
|
8
8
|
> [!IMPORTANT]
|
|
9
|
-
> Semantic Pages runs a local embedding model (~
|
|
9
|
+
> Semantic Pages runs a local embedding model (~22MB) on first launch. This download happens once and is cached at `~/.semantic-pages/models/`. No API key required. No data leaves your machine.
|
|
10
10
|
|
|
11
11
|
---
|
|
12
12
|
|
|
@@ -18,7 +18,7 @@ When you have markdown notes scattered across a project — a `vault/`, `docs/`,
|
|
|
18
18
|
|
|
19
19
|
## Operational Summary
|
|
20
20
|
|
|
21
|
-
The server indexes all `.md` files in a directory you point it at. Each file is parsed for YAML frontmatter, `[[wikilinks]]`, `#tags`, and headings. The text content is split into
|
|
21
|
+
The server indexes all `.md` files in a directory you point it at. Each file is parsed for YAML frontmatter, `[[wikilinks]]`, `#tags`, and headings. The text content is split into chunks and embedded locally using `all-MiniLM-L6-v2` — a 22MB model that runs natively in Node.js via ONNX. These embeddings are stored in an HNSW index for fast approximate nearest neighbor search. Simultaneously, a directed graph is built from wikilinks and shared tags using graphology.
|
|
22
22
|
|
|
23
23
|
When Claude calls `search_semantic`, the query is embedded and compared against all chunks via cosine similarity. When Claude calls `search_graph`, it does a breadth-first traversal from matching nodes. `search_hybrid` combines both — semantic results re-ranked by graph proximity. Beyond search, Claude can create, read, update, delete, and move notes, manage YAML frontmatter fields, add/remove/rename tags vault-wide, and query the knowledge graph for backlinks, forwardlinks, shortest paths, and connectivity statistics.
|
|
24
24
|
|
|
@@ -223,7 +223,7 @@ semantic-pages --notes ./vault --reindex
|
|
|
223
223
|
- If the index seems stale or corrupted
|
|
224
224
|
- After changing the embedding model
|
|
225
225
|
|
|
226
|
-
**What to expect**: Full re-parse, re-embed, and re-index of all markdown files. Takes
|
|
226
|
+
**What to expect**: Full re-parse, re-embed, and re-index of all markdown files. Takes 30 seconds to ~20 minutes depending on vault size and hardware. See [Performance Tuning](./.documentation/performance-tuning.md) for details.
|
|
227
227
|
|
|
228
228
|
---
|
|
229
229
|
|
|
@@ -480,8 +480,8 @@ src/
|
|
|
480
480
|
| Markdown parsing | `unified` + `remark-parse` | AST-based, handles wikilinks |
|
|
481
481
|
| Frontmatter | `gray-matter` | YAML/TOML frontmatter extraction |
|
|
482
482
|
| Wikilinks | `remark-wiki-link` | `[[note-name]]` extraction from AST |
|
|
483
|
-
| Embeddings | `@huggingface/transformers` |
|
|
484
|
-
| Embedding model | `
|
|
483
|
+
| Embeddings | `@huggingface/transformers` + `onnxruntime-node` | Native ONNX runtime, no Python, no API key |
|
|
484
|
+
| Embedding model | `all-MiniLM-L6-v2` (default) | ~22MB, fast (~3 min / 3K chunks), excellent retrieval quality |
|
|
485
485
|
| Vector index | `hnswlib-node` | HNSW algorithm, same as production vector DBs |
|
|
486
486
|
| Knowledge graph | `graphology` | Directed graph, serializable, rich algorithms |
|
|
487
487
|
| Graph algorithms | `graphology-traversal` + `graphology-shortest-path` | BFS, shortest path |
|
|
@@ -519,7 +519,7 @@ Plain text → split at sentence boundaries → ~512 token chunks
|
|
|
519
519
|
|
|
520
520
|
#### Step 3: Embed
|
|
521
521
|
```
|
|
522
|
-
Each chunk →
|
|
522
|
+
Each chunk → all-MiniLM-L6-v2 (native ONNX) → normalized Float32Array
|
|
523
523
|
```
|
|
524
524
|
|
|
525
525
|
#### Step 4: Index
|
|
@@ -573,14 +573,16 @@ const path = graph.findPath("overview.md", "auth.md");
|
|
|
573
573
|
|
|
574
574
|
| Metric | Value |
|
|
575
575
|
|--------|-------|
|
|
576
|
-
| Index 100 notes | ~
|
|
577
|
-
| Index
|
|
576
|
+
| Index 100 notes (~600 chunks) | ~30 seconds |
|
|
577
|
+
| Index 500 notes (~3,000 chunks) | ~3–5 minutes |
|
|
578
|
+
| Index 2,000 notes (~12,000 chunks) | ~15–20 minutes |
|
|
578
579
|
| Semantic search latency | <100ms |
|
|
579
580
|
| Text search latency | <10ms |
|
|
580
581
|
| Graph traversal latency | <5ms |
|
|
581
|
-
|
|
|
582
|
-
|
|
|
583
|
-
|
|
|
582
|
+
| Subsequent server starts (warm cache) | <1 second |
|
|
583
|
+
| Model download (first run) | ~22MB, cached at `~/.semantic-pages/models/` |
|
|
584
|
+
| Index size (500 notes) | ~30–50MB |
|
|
585
|
+
| npm package size | ~112 kB |
|
|
584
586
|
|
|
585
587
|
---
|
|
586
588
|
|
|
@@ -592,6 +594,18 @@ const path = graph.findPath("overview.md", "auth.md");
|
|
|
592
594
|
|
|
593
595
|
---
|
|
594
596
|
|
|
597
|
+
## Documentation
|
|
598
|
+
|
|
599
|
+
Deep-dive guides are in [`.documentation/`](./.documentation/):
|
|
600
|
+
|
|
601
|
+
- [**How It Works**](./.documentation/how-it-works.md) — architecture, processing pipeline, index format, search mechanics
|
|
602
|
+
- [**Performance Tuning**](./.documentation/performance-tuning.md) — model selection, batch size, workers, benchmarks
|
|
603
|
+
- [**Embedder Guide**](./.documentation/embedder-guide.md) — when/how to tune the embedder, model switching, cache management
|
|
604
|
+
- [**Troubleshooting**](./.documentation/troubleshooting.md) — common problems and fixes
|
|
605
|
+
- [**Changelog**](./.documentation/changelog.md) — version history with rationale
|
|
606
|
+
|
|
607
|
+
---
|
|
608
|
+
|
|
595
609
|
## Troubleshooting
|
|
596
610
|
|
|
597
611
|
### Installation Issues
|
package/dist/mcp/server.js
CHANGED
|
@@ -4472,11 +4472,17 @@ async function createServer(notesPath, options = {}) {
|
|
|
4472
4472
|
);
|
|
4473
4473
|
}
|
|
4474
4474
|
);
|
|
4475
|
-
const cached = await tryLoadCachedIndex();
|
|
4476
4475
|
if (options.waitForReady) {
|
|
4476
|
+
await tryLoadCachedIndex();
|
|
4477
4477
|
await fullIndex();
|
|
4478
|
-
} else
|
|
4479
|
-
|
|
4478
|
+
} else {
|
|
4479
|
+
tryLoadCachedIndex().then((cached) => {
|
|
4480
|
+
if (!cached) backgroundIndex();
|
|
4481
|
+
}).catch((err) => {
|
|
4482
|
+
process.stderr.write(`Startup error: ${err?.message ?? err}
|
|
4483
|
+
`);
|
|
4484
|
+
backgroundIndex();
|
|
4485
|
+
});
|
|
4480
4486
|
}
|
|
4481
4487
|
if (options.watch !== false) {
|
|
4482
4488
|
const watcher = new Watcher(notesPath);
|