@theglitchking/semantic-pages 0.1.3 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,25 +1,89 @@
1
- # semantic-pages
1
+ # Semantic Pages
2
2
 
3
- Semantic search + knowledge graph MCP server for any folder of markdown files.
3
+ > Semantic search + knowledge graph MCP server for any folder of markdown files.
4
4
 
5
- No Docker. No Python. No Obsidian. Just `npx`.
5
+ [![npm version](https://img.shields.io/npm/v/@theglitchking/semantic-pages.svg)](https://www.npmjs.com/package/@theglitchking/semantic-pages)
6
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
6
7
 
7
- ## Install
8
+ > [!IMPORTANT]
9
+ > Semantic Pages runs a local embedding model (~80MB) on first launch. This download happens once and is cached at `~/.semantic-pages/models/`. No API key required. No data leaves your machine.
8
10
 
11
+ ---
12
+
13
+ ## Summary
14
+
15
+ When you have markdown notes scattered across a project — a `vault/`, `docs/`, `notes/`, or wiki — your AI assistant can't search them by meaning, traverse their connections, or help you maintain them. Semantic Pages fixes this by indexing your markdown files into a vector database and knowledge graph, then exposing 21 MCP tools that let Claude (or any MCP-compatible client) search semantically, traverse wikilinks, manage frontmatter, and perform full CRUD operations. No Docker, no Python, no Obsidian required — just `npx`.
16
+
17
+ ---
18
+
19
+ ## Operational Summary
20
+
21
+ The server indexes all `.md` files in a directory you point it at. Each file is parsed for YAML frontmatter, `[[wikilinks]]`, `#tags`, and headings. The text content is split into ~512-token chunks and embedded locally using the `nomic-embed-text-v1.5` model running via WebAssembly in Node.js. These embeddings are stored in an HNSW index for fast approximate nearest neighbor search. Simultaneously, a directed graph is built from wikilinks and shared tags using graphology.
22
+
23
+ When Claude calls `search_semantic`, the query is embedded and compared against all chunks via cosine similarity. When Claude calls `search_graph`, it does a breadth-first traversal from matching nodes. `search_hybrid` combines both — semantic results re-ranked by graph proximity. Beyond search, Claude can create, read, update, delete, and move notes, manage YAML frontmatter fields, add/remove/rename tags vault-wide, and query the knowledge graph for backlinks, forwardlinks, shortest paths, and connectivity statistics.
24
+
25
+ The index is stored in `.semantic-pages-index/` alongside your notes (gitignore it). A file watcher detects changes and re-indexes incrementally. Everything runs locally over stdio — no network, no server, no background processes beyond the MCP connection itself.
26
+
27
+ ---
28
+
29
+ ## Features
30
+
31
+ - **Semantic Search**: Find notes by meaning, not just keywords, using local vector embeddings
32
+ - **Knowledge Graph**: Traverse `[[wikilinks]]` and shared `#tags` as a directed graph
33
+ - **Hybrid Search**: Combined vector + graph search with re-ranking
34
+ - **Full-Text Search**: Keyword and regex search with path, tag, and case filters
35
+ - **Full CRUD**: Create, read, update (overwrite/append/prepend/patch-by-heading), delete, and move notes
36
+ - **Frontmatter Management**: Get and set YAML frontmatter fields atomically
37
+ - **Tag Management**: Add, remove, list, and rename tags vault-wide (frontmatter + inline)
38
+ - **Graph Queries**: Backlinks, forwardlinks, shortest path, graph statistics (orphans, density, most connected)
39
+ - **File Watcher**: Incremental re-indexing on file changes with debounce
40
+ - **Local Embeddings**: No API key, no network after first model download
41
+ - **Zero Dependencies Beyond Node**: No Docker, no Python, no Obsidian, no GUI
42
+
43
+ ---
44
+
45
+ ## Quick Start
46
+
47
+ ### 1. Installation Methods
48
+
49
+ #### Method A: NPX (No installation needed)
50
+
51
+ This lets you run the server without installing it permanently.
52
+
53
+ **Step 1**: Open your terminal in your project folder
54
+
55
+ **Step 2**: Run:
9
56
  ```bash
10
- npx semantic-pages --notes ./vault
57
+ npx semantic-pages --notes ./vault --stats
11
58
  ```
12
59
 
13
- Or install globally:
60
+ **Step 3**: The first time you run it, NPX downloads the package and the embedding model (~80MB). This takes 1-2 minutes.
61
+
62
+ **Step 4**: After that, it runs instantly.
63
+
64
+ **Use this method when**: You want to try it out, or you're adding it to a project's `.mcp.json` config.
14
65
 
66
+ #### Method B: Global Installation (Recommended for regular use)
67
+
68
+ This installs the tool on your computer so you can use it in any project.
69
+
70
+ **Step 1**: Open your terminal
71
+
72
+ **Step 2**: Type this command and press Enter:
15
73
  ```bash
16
- npm install -g semantic-pages
17
- semantic-pages --notes ./vault
74
+ npm install -g @theglitchking/semantic-pages
18
75
  ```
19
76
 
20
- ## MCP Configuration
77
+ **Step 3**: Test that it worked:
78
+ ```bash
79
+ semantic-pages --version
80
+ ```
81
+
82
+ **Step 4**: You should see a version number. If you do, it's installed correctly!
21
83
 
22
- Add to your `.mcp.json` (Claude Code, Cursor, etc.):
84
+ #### Method C: MCP Configuration (Recommended for Claude Code)
85
+
86
+ Add to your project's `.mcp.json` so Claude has automatic access:
23
87
 
24
88
  ```json
25
89
  {
@@ -32,105 +96,344 @@ Add to your `.mcp.json` (Claude Code, Cursor, etc.):
32
96
 
33
97
  Point `--notes` at any folder of `.md` files: `./vault`, `./docs`, `./notes`, or `.` for the whole repo.
34
98
 
35
- ## What It Does
99
+ **What to expect**: Next time you run `claude` in that project, Claude will have 21 new tools for searching, reading, writing, and traversing your notes.
36
100
 
37
- semantic-pages gives your AI assistant native tool access to your markdown notes via the Model Context Protocol. It replaces the Obsidian + Smart Connections + MCP plugin stack with a single npm package.
101
+ #### Method D: Project Installation (For team projects)
38
102
 
39
- ### 21 MCP Tools
103
+ This installs the tool only for one specific project.
40
104
 
41
- **Search**
105
+ **Step 1**: Open your terminal in your project folder
42
106
 
43
- | Tool | Description |
44
- |------|-------------|
45
- | `search_semantic` | Vector similarity search by meaning |
46
- | `search_text` | Full-text keyword/regex search with filters |
47
- | `search_graph` | Graph traversal via wikilinks and tags |
48
- | `search_hybrid` | Combined semantic + graph, re-ranked |
107
+ **Step 2**: Type this command:
108
+ ```bash
109
+ npm install --save-dev @theglitchking/semantic-pages
110
+ ```
49
111
 
50
- **Read**
112
+ **Step 3**: Add a script to your `package.json` file:
113
+ ```json
114
+ {
115
+ "scripts": {
116
+ "notes": "semantic-pages --notes ./vault",
117
+ "notes:stats": "semantic-pages --notes ./vault --stats",
118
+ "notes:reindex": "semantic-pages --notes ./vault --reindex"
119
+ }
120
+ }
121
+ ```
122
+
123
+ ---
124
+
125
+ ### 2. How to Use
126
+
127
+ #### CLI Commands
128
+
129
+ These commands run in your terminal and manage your notes index.
130
+
131
+ | Command | Description |
132
+ |---------|-------------|
133
+ | `semantic-pages --notes <path>` | Start MCP server (default mode) |
134
+ | `semantic-pages --notes <path> --stats` | Show vault statistics and exit |
135
+ | `semantic-pages --notes <path> --reindex` | Force full reindex and exit |
136
+ | `semantic-pages --notes <path> --no-watch` | Start server without file watcher |
137
+ | `semantic-pages tools` | List all 21 MCP tools with descriptions |
138
+ | `semantic-pages tools <name>` | Show arguments and examples for a specific tool |
139
+ | `semantic-pages --version` | Show version number |
140
+ | `semantic-pages --help` | Show all options |
141
+
142
+ ##### Built-in Tool Help
143
+
144
+ Every MCP tool has built-in documentation accessible from the CLI:
145
+
146
+ ```bash
147
+ # List all 21 tools organized by category
148
+ semantic-pages tools
149
+ ```
150
+
151
+ ```
152
+ Semantic Pages — 21 MCP Tools
153
+
154
+ Search:
155
+ search_semantic Vector similarity search — find notes by meaning, not just keywords
156
+ search_text Full-text keyword or regex search with optional filters
157
+ search_graph Graph traversal — find notes connected to a concept via wikilinks and tags
158
+ search_hybrid Combined semantic + graph search — vector results re-ranked by graph proximity
159
+
160
+ Read:
161
+ read_note Read the full content of a specific note by path
162
+ read_multiple_notes Batch read multiple notes in one call
163
+ list_notes List all indexed notes with metadata (title, tags, link count)
164
+ ...
165
+ ```
166
+
167
+ ```bash
168
+ # Get detailed help for a specific tool — arguments, types, and examples
169
+ semantic-pages tools search_semantic
170
+ ```
171
+
172
+ ```
173
+ search_semantic
174
+ ───────────────
175
+ Vector similarity search — find notes by meaning, not just keywords
176
+
177
+ Arguments:
178
+ { "query": "string", "limit?": 10 }
179
+
180
+ Examples:
181
+ { "query": "microservices architecture", "limit": 5 }
182
+ { "query": "how to deploy to production" }
183
+ ```
184
+
185
+ ```bash
186
+ # More examples
187
+ semantic-pages tools update_note # See all 4 editing modes
188
+ semantic-pages tools move_note # See wikilink-aware rename
189
+ semantic-pages tools manage_tags # See add/remove/list actions
190
+ semantic-pages tools rename_tag # See vault-wide tag rename
191
+ ```
192
+
193
+ ##### Command Examples and Details
194
+
195
+ **`--stats` - Check your vault**
196
+
197
+ **How to use it**:
198
+ ```bash
199
+ semantic-pages --notes ./vault --stats
200
+ ```
201
+
202
+ **When to use it**: Quick check to see what's in your vault.
203
+
204
+ **What to expect**:
205
+ ```
206
+ Notes: 47
207
+ Chunks: 312
208
+ Wikilinks: 89
209
+ Tags: 23 unique
210
+ ```
211
+
212
+ ---
213
+
214
+ **`--reindex` - Rebuild the index**
215
+
216
+ **How to use it**:
217
+ ```bash
218
+ semantic-pages --notes ./vault --reindex
219
+ ```
220
+
221
+ **When to use it**:
222
+ - After bulk-adding or modifying notes outside of the MCP tools
223
+ - If the index seems stale or corrupted
224
+ - After changing the embedding model
225
+
226
+ **What to expect**: Full re-parse, re-embed, and re-index of all markdown files. Takes 10-60 seconds depending on vault size and whether the model is cached.
227
+
228
+ ---
229
+
230
+ #### MCP Tools
231
+
232
+ When the server is running (via `.mcp.json` or CLI), Claude has access to these 21 tools:
233
+
234
+ ##### Search Tools
51
235
 
52
236
  | Tool | Description |
53
237
  |------|-------------|
54
- | `read_note` | Read full content of a note |
55
- | `read_multiple_notes` | Batch read multiple notes |
56
- | `list_notes` | List all notes with metadata |
238
+ | `search_semantic` | Vector similarity search "find notes similar to this idea" |
239
+ | `search_text` | Full-text keyword/regex search with path, tag, and case filters |
240
+ | `search_graph` | Graph traversal — "find notes connected to this concept" |
241
+ | `search_hybrid` | Combined — semantic results re-ranked by graph proximity |
242
+
243
+ **`search_semantic` - Find notes by meaning**
244
+
245
+ **When Claude uses it**: When you ask things like "find notes about deployment strategies" or "what have I written about authentication?"
246
+
247
+ **What to expect**: Returns notes ranked by semantic similarity to your query, with relevance scores and text snippets. Works even if the exact words don't appear in the notes.
248
+
249
+ **Example conversation**:
250
+ ```
251
+ You: What notes do I have about scaling microservices?
252
+ Claude: [calls search_semantic with query "scaling microservices"]
253
+ Claude: I found 4 relevant notes:
254
+ 1. architecture/scaling-patterns.md (0.87 similarity) — discusses horizontal vs vertical scaling
255
+ 2. devops/kubernetes-autoscaling.md (0.82 similarity) — HPA and VPA configuration
256
+ 3. architecture/service-mesh.md (0.71 similarity) — mentions scaling in the context of Istio
257
+ 4. meeting-notes/2024-03-15.md (0.65 similarity) — team discussion about scaling concerns
258
+ ```
259
+
260
+ ---
57
261
 
58
- **Write**
262
+ **`search_text` - Find exact matches**
263
+
264
+ **When Claude uses it**: When you need exact keyword or regex matches, not semantic similarity.
265
+
266
+ **What to expect**: Returns notes containing the exact pattern, with snippets showing context. Supports:
267
+ - Case-sensitive/insensitive search
268
+ - Regex patterns
269
+ - Path glob filters (e.g., only search in `notes/`)
270
+ - Tag filters (e.g., only search notes tagged `#architecture`)
271
+
272
+ ---
273
+
274
+ **`search_graph` - Traverse connections**
275
+
276
+ **When Claude uses it**: When you want to explore how notes are connected — "what's related to this concept?"
277
+
278
+ **What to expect**: Starting from notes matching your concept, does a breadth-first traversal through wikilinks and shared tags, returning all connected notes within the specified depth.
279
+
280
+ ---
281
+
282
+ **`search_hybrid` - Best of both**
283
+
284
+ **When Claude uses it**: When you want comprehensive results — semantic matches boosted by graph proximity.
285
+
286
+ **What to expect**: Semantic search results re-ranked so that notes which are also graph-connected score higher. Best for "find everything relevant to X."
287
+
288
+ ---
289
+
290
+ ##### Read Tools
59
291
 
60
292
  | Tool | Description |
61
293
  |------|-------------|
62
- | `create_note` | Create a new markdown note |
63
- | `update_note` | Edit (overwrite, append, prepend, patch by heading) |
64
- | `delete_note` | Delete a note (requires confirmation) |
65
- | `move_note` | Move/rename, updates wikilinks across vault |
294
+ | `read_note` | Read full content of a specific note |
295
+ | `read_multiple_notes` | Batch read multiple notes in one call |
296
+ | `list_notes` | List all indexed notes with metadata (title, tags, link count) |
66
297
 
67
- **Metadata**
298
+ ---
299
+
300
+ ##### Write Tools
68
301
 
69
302
  | Tool | Description |
70
303
  |------|-------------|
71
- | `get_frontmatter` | Read YAML frontmatter as JSON |
72
- | `update_frontmatter` | Set/delete frontmatter keys |
73
- | `manage_tags` | Add, remove, or list tags |
74
- | `rename_tag` | Vault-wide tag rename |
304
+ | `create_note` | Create a new markdown note with optional frontmatter |
305
+ | `update_note` | Edit note content (overwrite, append, prepend, or patch by heading) |
306
+ | `delete_note` | Delete a note (requires explicit confirmation) |
307
+ | `move_note` | Move/rename a note — automatically updates wikilinks across the vault |
308
+
309
+ **`update_note` - Four editing modes**
75
310
 
76
- **Graph**
311
+ **Modes**:
312
+ - `overwrite` — replace entire content
313
+ - `append` — add to the end
314
+ - `prepend` — add after frontmatter, before existing content
315
+ - `patch-by-heading` — replace the content under a specific heading (preserves other sections)
316
+
317
+ **Example**:
318
+ ```
319
+ You: Add a "Rollback" section to the deployment guide
320
+ Claude: [calls update_note with mode "patch-by-heading", heading "Rollback"]
321
+ Claude: Updated deployment-guide.md — added Rollback section with kubectl rollback instructions.
322
+ ```
323
+
324
+ ---
325
+
326
+ **`move_note` - Smart rename**
327
+
328
+ **What makes it special**: When you move `user-service.md` to `auth-service.md`, every `[[user-service]]` wikilink in every other note gets updated to `[[auth-service]]` automatically.
329
+
330
+ ---
331
+
332
+ ##### Metadata Tools
77
333
 
78
334
  | Tool | Description |
79
335
  |------|-------------|
80
- | `backlinks` | Notes linking TO a given note |
81
- | `forwardlinks` | Notes linked FROM a given note |
82
- | `graph_path` | Shortest path between two notes |
83
- | `graph_statistics` | Most connected nodes, orphans, density |
336
+ | `get_frontmatter` | Read parsed YAML frontmatter as JSON |
337
+ | `update_frontmatter` | Set or delete frontmatter keys atomically (pass `null` to delete) |
338
+ | `manage_tags` | Add, remove, or list tags on a note (frontmatter + inline) |
339
+ | `rename_tag` | Rename a tag across all notes in the vault |
84
340
 
85
- **System**
341
+ **`rename_tag` - Vault-wide tag rename**
342
+
343
+ **When Claude uses it**: When you want to rename `#architecture` to `#arch` everywhere — in frontmatter `tags:` arrays and inline `#tags` across every file.
344
+
345
+ **What to expect**: Returns the count of files modified.
346
+
347
+ ---
348
+
349
+ ##### Graph Tools
86
350
 
87
351
  | Tool | Description |
88
352
  |------|-------------|
89
- | `get_stats` | Vault stats (notes, chunks, embeddings, graph) |
90
- | `reindex` | Force full reindex |
353
+ | `backlinks` | All notes that link TO a given note via `[[wikilinks]]` |
354
+ | `forwardlinks` | All notes linked FROM a given note |
355
+ | `graph_path` | Shortest path between two notes in the knowledge graph |
356
+ | `graph_statistics` | Most connected nodes, orphan count, graph density |
91
357
 
92
- ## How It Works
358
+ **`graph_path` - Find connections between notes**
93
359
 
94
- 1. **Indexes** all `.md` files: parses frontmatter, extracts `[[wikilinks]]`, `#tags`, headers
95
- 2. **Embeds** text chunks using a local model ([nomic-embed-text-v1.5](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5)) via WASM — no API key needed
96
- 3. **Builds** a knowledge graph from wikilinks and shared tags using [graphology](https://graphology.github.io/)
97
- 4. **Creates** an HNSW vector index for fast approximate nearest neighbor search
98
- 5. **Watches** for file changes and re-indexes incrementally
99
- 6. **Serves** all of this over MCP stdio protocol
360
+ **Example conversation**:
361
+ ```
362
+ You: How are the deployment guide and the user service connected?
363
+ Claude: [calls graph_path from "deployment-guide.md" to "user-service.md"]
364
+ Claude: Path: deployment-guide.md microservices.md user-service.md
365
+ The deployment guide links to the microservices overview, which links to the user service.
366
+ ```
100
367
 
101
- The index is stored in `.semantic-pages-index/` alongside your notes (gitignore it). The model is downloaded once to `~/.semantic-pages/models/`.
368
+ ---
102
369
 
103
- ## CLI
370
+ **`graph_statistics` - Vault health overview**
104
371
 
105
- ```bash
106
- # Start MCP server (default)
107
- semantic-pages --notes ./vault
372
+ **What to expect**:
373
+ ```json
374
+ {
375
+ "totalNodes": 47,
376
+ "totalEdges": 89,
377
+ "orphanCount": 3,
378
+ "mostConnected": [
379
+ { "path": "project-overview.md", "connections": 12 },
380
+ { "path": "microservices.md", "connections": 9 }
381
+ ],
382
+ "density": 0.04
383
+ }
384
+ ```
108
385
 
109
- # Show vault statistics
110
- semantic-pages --notes ./vault --stats
386
+ ---
111
387
 
112
- # Force reindex and exit
113
- semantic-pages --notes ./vault --reindex
388
+ ##### System Tools
114
389
 
115
- # Disable file watcher
116
- semantic-pages --notes ./vault --no-watch
390
+ | Tool | Description |
391
+ |------|-------------|
392
+ | `get_stats` | Vault stats — total notes, chunks, embeddings, graph density, model info |
393
+ | `reindex` | Force full reindex of the vault |
394
+
395
+ ---
396
+
397
+ ## Common Workflows
398
+
399
+ ### Quick Vault Check (10 seconds)
400
+ ```bash
401
+ semantic-pages --notes ./vault --stats
117
402
  ```
118
403
 
119
- ## As a Library
404
+ ### Adding Semantic Pages to a Project (2 minutes)
405
+ ```bash
406
+ # Step 1: Create .mcp.json in your project root
407
+ echo '{
408
+ "semantic-pages": {
409
+ "command": "npx",
410
+ "args": ["-y", "semantic-pages", "--notes", "./notes"]
411
+ }
412
+ }' > .mcp.json
120
413
 
121
- ```typescript
122
- import { Indexer, Embedder, GraphBuilder, VectorIndex } from "semantic-pages";
414
+ # Step 2: Add index to .gitignore
415
+ echo ".semantic-pages-index/" >> .gitignore
123
416
 
124
- const indexer = new Indexer("./vault");
125
- const docs = await indexer.indexAll();
417
+ # Step 3: Start Claude — it now has 21 note tools
418
+ claude
419
+ ```
126
420
 
127
- const embedder = new Embedder();
128
- await embedder.init();
129
- const vec = await embedder.embed("search query");
421
+ ### Asking Claude About Your Notes
130
422
  ```
423
+ You: What have I written about authentication?
424
+ Claude: [calls search_semantic] I found 3 notes about authentication...
425
+
426
+ You: What links to the API gateway doc?
427
+ Claude: [calls backlinks] 4 notes link to api-gateway.md...
131
428
 
132
- ## Per-Repo Pattern
429
+ You: Create a new note summarizing today's meeting
430
+ Claude: [calls create_note] Created meeting-2024-03-15.md with frontmatter...
431
+
432
+ You: Rename the #backend tag to #server across all notes
433
+ Claude: [calls rename_tag] Renamed #backend to #server in 12 files.
434
+ ```
133
435
 
436
+ ### Per-Repo Pattern
134
437
  ```
135
438
  any-repo/
136
439
  ├── notes/ # your markdown files
@@ -141,10 +444,257 @@ any-repo/
141
444
 
142
445
  Each repo gets its own independent knowledge base. No shared state between projects.
143
446
 
447
+ ---
448
+
449
+ ## Technical Details
450
+
451
+ ### Architecture Overview
452
+
453
+ Semantic Pages is built with TypeScript and organized into a core library with thin transport layers:
454
+
455
+ ```
456
+ src/
457
+ ├── core/ # Pure library — no transport assumptions
458
+ │ ├── index.ts # Core exports
459
+ │ ├── types.ts # Shared type definitions
460
+ │ ├── indexer.ts # Markdown parser (unified + remark)
461
+ │ ├── embedder.ts # Local embedding model (@huggingface/transformers)
462
+ │ ├── graph.ts # Knowledge graph (graphology)
463
+ │ ├── vector.ts # HNSW vector index (hnswlib-node)
464
+ │ ├── search-text.ts # Full-text / regex search
465
+ │ ├── crud.ts # Create/update/delete/move notes
466
+ │ ├── frontmatter.ts # Frontmatter + tag management
467
+ │ └── watcher.ts # File watcher (chokidar)
468
+
469
+ ├── mcp/ # MCP stdio server (thin wrapper over core)
470
+ │ └── server.ts # Server setup + 21 tool definitions
471
+
472
+ └── cli/ # CLI entrypoint
473
+ └── index.ts # commander-based CLI
474
+ ```
475
+
476
+ ### Tech Stack
477
+
478
+ | Concern | Package | Why |
479
+ |---------|---------|-----|
480
+ | Markdown parsing | `unified` + `remark-parse` | AST-based, handles wikilinks |
481
+ | Frontmatter | `gray-matter` | YAML/TOML frontmatter extraction |
482
+ | Wikilinks | `remark-wiki-link` | `[[note-name]]` extraction from AST |
483
+ | Embeddings | `@huggingface/transformers` | WASM runtime, no Python, no API key |
484
+ | Embedding model | `nomic-embed-text-v1.5` | High quality, ~80MB, runs locally |
485
+ | Vector index | `hnswlib-node` | HNSW algorithm, same as production vector DBs |
486
+ | Knowledge graph | `graphology` | Directed graph, serializable, rich algorithms |
487
+ | Graph algorithms | `graphology-traversal` + `graphology-shortest-path` | BFS, shortest path |
488
+ | File watching | `chokidar` | Cross-platform, debounced |
489
+ | MCP server | `@modelcontextprotocol/sdk` | Official MCP TypeScript SDK |
490
+ | CLI | `commander` | Standard Node.js CLI framework |
491
+
492
+ ### Index Layout
493
+
494
+ ```
495
+ .semantic-pages-index/ # gitignored, rebuilt on demand
496
+ ├── embeddings.json # serialized chunk vectors
497
+ ├── hnsw.bin # HNSW vector index
498
+ ├── hnsw-meta.json # chunk → document mapping
499
+ ├── graph.json # knowledge graph (graphology format)
500
+ └── meta.json # index metadata (vault path, model, timestamp)
501
+ ```
502
+
503
+ ### Document Processing Pipeline
504
+
505
+ #### Step 1: Parse
506
+ ```
507
+ .md file → gray-matter (frontmatter) → remark (AST) → extract:
508
+ - title (frontmatter > first heading > filename)
509
+ - wikilinks ([[note-name]])
510
+ - tags (frontmatter tags: + inline #tags)
511
+ - headers (H1-H6)
512
+ - plain text (markdown stripped)
513
+ ```
514
+
515
+ #### Step 2: Chunk
516
+ ```
517
+ Plain text → split at sentence boundaries → ~512 token chunks
518
+ ```
519
+
520
+ #### Step 3: Embed
521
+ ```
522
+ Each chunk → nomic-embed-text-v1.5 (WASM) → normalized Float32Array
523
+ ```
524
+
525
+ #### Step 4: Index
526
+ ```
527
+ Embeddings → HNSW index (hnswlib-node)
528
+ Wikilinks + tags → directed graph (graphology)
529
+ ```
530
+
531
+ #### Step 5: Serve
532
+ ```
533
+ MCP tools → query embeddings / graph / files → return results
534
+ ```
535
+
536
+ ### Using as a Library
537
+
538
+ The core library is importable independently of the MCP server:
539
+
540
+ ```typescript
541
+ import { Indexer, Embedder, GraphBuilder, VectorIndex, TextSearch } from "@theglitchking/semantic-pages";
542
+
543
+ // Index all notes
544
+ const indexer = new Indexer("./vault");
545
+ const docs = await indexer.indexAll();
546
+
547
+ // Build embeddings
548
+ const embedder = new Embedder();
549
+ await embedder.init();
550
+ const chunks = docs.flatMap(d => d.chunks);
551
+ const vecs = await embedder.embedBatch(chunks);
552
+
553
+ // Build vector index
554
+ const vectorIndex = new VectorIndex(embedder.getDimensions());
555
+ vectorIndex.build(vecs, chunks.map((text, i) => ({
556
+ docPath: docs[Math.floor(i / docs.length)].path,
557
+ chunkIndex: i,
558
+ text
559
+ })));
560
+
561
+ // Search
562
+ const queryVec = await embedder.embed("microservices architecture");
563
+ const results = vectorIndex.search(queryVec, 5);
564
+
565
+ // Build knowledge graph
566
+ const graph = new GraphBuilder();
567
+ graph.buildFromDocuments(docs);
568
+ const backlinks = graph.backlinks("project-overview.md");
569
+ const path = graph.findPath("overview.md", "auth.md");
570
+ ```
571
+
572
+ ### Performance
573
+
574
+ | Metric | Value |
575
+ |--------|-------|
576
+ | Index 100 notes | ~5 seconds |
577
+ | Index 1,000 notes | ~30 seconds |
578
+ | Semantic search latency | <100ms |
579
+ | Text search latency | <10ms |
580
+ | Graph traversal latency | <5ms |
581
+ | Model download (first run) | ~80MB, cached at `~/.semantic-pages/models/` |
582
+ | Index size (100 notes) | ~10MB |
583
+ | npm package size | 85.7 kB |
584
+
585
+ ---
586
+
144
587
  ## Requirements
145
588
 
146
- - Node.js >= 18
589
+ - **Node.js**: Version 18.0.0 or higher
590
+ - **Operating System**: Linux, macOS, or Windows (with WSL2)
591
+ - **Disk Space**: ~80MB for the embedding model (downloaded once)
592
+
593
+ ---
594
+
595
+ ## Troubleshooting
596
+
597
+ ### Installation Issues
598
+
599
+ **Problem**: `npx semantic-pages` fails or shows "not found"
600
+
601
+ **Solution**:
602
+ ```bash
603
+ # Clear npx cache and retry
604
+ npx --yes semantic-pages --notes ./vault --stats
605
+
606
+ # Or install globally
607
+ npm install -g @theglitchking/semantic-pages
608
+ ```
609
+
610
+ **Problem**: Model download fails
611
+
612
+ **Solution**:
613
+ ```bash
614
+ # Check internet connection, then retry
615
+ # The model is cached at ~/.semantic-pages/models/
616
+ # Delete and re-download if corrupted:
617
+ rm -rf ~/.semantic-pages/models/
618
+ semantic-pages --notes ./vault --reindex
619
+ ```
620
+
621
+ ### Usage Issues
622
+
623
+ **Problem**: Search returns no results
624
+
625
+ **Solution**:
626
+ ```bash
627
+ # Force reindex
628
+ semantic-pages --notes ./vault --reindex
629
+
630
+ # Check that .md files exist in the path
631
+ ls ./vault/*.md
632
+ ```
633
+
634
+ **Problem**: Index seems stale after editing files externally
635
+
636
+ **Solution**: The file watcher should catch changes, but if it misses some:
637
+ ```bash
638
+ # Force reindex
639
+ semantic-pages --notes ./vault --reindex
640
+ ```
641
+
642
+ **Problem**: `hnswlib-node` fails to install (native addon)
643
+
644
+ **Solution**:
645
+ ```bash
646
+ # Install build tools
647
+ # On Ubuntu/Debian:
648
+ sudo apt install build-essential python3
649
+
650
+ # On macOS:
651
+ xcode-select --install
652
+
653
+ # Then retry
654
+ npm install -g @theglitchking/semantic-pages
655
+ ```
656
+
657
+ ---
658
+
659
+ ## Contributing
660
+
661
+ Contributions are welcome! The project uses:
662
+ - **TypeScript** with strict mode
663
+ - **tsup** for bundling (ESM)
664
+ - **vitest** for testing (123 tests across 11 suites)
665
+
666
+ ```bash
667
+ # Clone and install
668
+ git clone https://github.com/TheGlitchKing/semantic-pages.git
669
+ cd semantic-pages
670
+ npm install
671
+
672
+ # Run tests
673
+ npm test
674
+
675
+ # Build
676
+ npm run build
677
+
678
+ # Type check
679
+ npm run lint
680
+ ```
681
+
682
+ ---
147
683
 
148
684
  ## License
149
685
 
150
- MIT
686
+ MIT License - see [LICENSE](./LICENSE) file for details.
687
+
688
+ ---
689
+
690
+ ## Support
691
+
692
+ - **GitHub Issues**: [Report bugs or request features](https://github.com/TheGlitchKing/semantic-pages/issues)
693
+ - **NPM Package**: [@theglitchking/semantic-pages](https://www.npmjs.com/package/@theglitchking/semantic-pages)
694
+ - **Marketplace**: [Glitch Kingdom of Plugins](https://github.com/TheGlitchKing/glitch-kingdom-of-plugins)
695
+
696
+ ---
697
+
698
+ **Made with care by TheGlitchKing**
699
+
700
+ [NPM](https://www.npmjs.com/package/@theglitchking/semantic-pages) | [GitHub](https://github.com/TheGlitchKing/semantic-pages) | [Issues](https://github.com/TheGlitchKing/semantic-pages/issues)