@theglitchking/semantic-pages 0.1.3 → 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +621 -71
- package/dist/{chunk-KF45H64M.js → chunk-L6KIXRRI.js} +191 -20
- package/dist/chunk-L6KIXRRI.js.map +1 -0
- package/dist/chunk-TDC45FQJ.js +0 -0
- package/dist/chunk-TDC45FQJ.js.map +0 -0
- package/dist/cli/index.d.ts +0 -0
- package/dist/cli/index.js +221 -14
- package/dist/cli/index.js.map +1 -1
- package/dist/core/embed-worker.d.ts +2 -0
- package/dist/core/embed-worker.js +89 -0
- package/dist/core/embed-worker.js.map +1 -0
- package/dist/core/index.d.ts +20 -3
- package/dist/core/index.js +1 -1
- package/dist/core/index.js.map +0 -0
- package/dist/indexer-HSCSXWIO.js +0 -0
- package/dist/indexer-HSCSXWIO.js.map +0 -0
- package/dist/mcp/server.d.ts +9 -6
- package/dist/mcp/server.js +168 -64
- package/dist/mcp/server.js.map +1 -1
- package/package.json +5 -1
- package/dist/chunk-KF45H64M.js.map +0 -1
package/README.md
CHANGED
|
@@ -1,25 +1,89 @@
|
|
|
1
|
-
#
|
|
1
|
+
# Semantic Pages
|
|
2
2
|
|
|
3
|
-
Semantic search + knowledge graph MCP server for any folder of markdown files.
|
|
3
|
+
> Semantic search + knowledge graph MCP server for any folder of markdown files.
|
|
4
4
|
|
|
5
|
-
|
|
5
|
+
[](https://www.npmjs.com/package/@theglitchking/semantic-pages)
|
|
6
|
+
[](https://opensource.org/licenses/MIT)
|
|
6
7
|
|
|
7
|
-
|
|
8
|
+
> [!IMPORTANT]
|
|
9
|
+
> Semantic Pages runs a local embedding model (~80MB) on first launch. This download happens once and is cached at `~/.semantic-pages/models/`. No API key required. No data leaves your machine.
|
|
8
10
|
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
## Summary
|
|
14
|
+
|
|
15
|
+
When you have markdown notes scattered across a project — a `vault/`, `docs/`, `notes/`, or wiki — your AI assistant can't search them by meaning, traverse their connections, or help you maintain them. Semantic Pages fixes this by indexing your markdown files into a vector database and knowledge graph, then exposing 21 MCP tools that let Claude (or any MCP-compatible client) search semantically, traverse wikilinks, manage frontmatter, and perform full CRUD operations. No Docker, no Python, no Obsidian required — just `npx`.
|
|
16
|
+
|
|
17
|
+
---
|
|
18
|
+
|
|
19
|
+
## Operational Summary
|
|
20
|
+
|
|
21
|
+
The server indexes all `.md` files in a directory you point it at. Each file is parsed for YAML frontmatter, `[[wikilinks]]`, `#tags`, and headings. The text content is split into ~512-token chunks and embedded locally using the `nomic-embed-text-v1.5` model running via WebAssembly in Node.js. These embeddings are stored in an HNSW index for fast approximate nearest neighbor search. Simultaneously, a directed graph is built from wikilinks and shared tags using graphology.
|
|
22
|
+
|
|
23
|
+
When Claude calls `search_semantic`, the query is embedded and compared against all chunks via cosine similarity. When Claude calls `search_graph`, it does a breadth-first traversal from matching nodes. `search_hybrid` combines both — semantic results re-ranked by graph proximity. Beyond search, Claude can create, read, update, delete, and move notes, manage YAML frontmatter fields, add/remove/rename tags vault-wide, and query the knowledge graph for backlinks, forwardlinks, shortest paths, and connectivity statistics.
|
|
24
|
+
|
|
25
|
+
The index is stored in `.semantic-pages-index/` alongside your notes (gitignore it). A file watcher detects changes and re-indexes incrementally. Everything runs locally over stdio — no network, no server, no background processes beyond the MCP connection itself.
|
|
26
|
+
|
|
27
|
+
---
|
|
28
|
+
|
|
29
|
+
## Features
|
|
30
|
+
|
|
31
|
+
- **Semantic Search**: Find notes by meaning, not just keywords, using local vector embeddings
|
|
32
|
+
- **Knowledge Graph**: Traverse `[[wikilinks]]` and shared `#tags` as a directed graph
|
|
33
|
+
- **Hybrid Search**: Combined vector + graph search with re-ranking
|
|
34
|
+
- **Full-Text Search**: Keyword and regex search with path, tag, and case filters
|
|
35
|
+
- **Full CRUD**: Create, read, update (overwrite/append/prepend/patch-by-heading), delete, and move notes
|
|
36
|
+
- **Frontmatter Management**: Get and set YAML frontmatter fields atomically
|
|
37
|
+
- **Tag Management**: Add, remove, list, and rename tags vault-wide (frontmatter + inline)
|
|
38
|
+
- **Graph Queries**: Backlinks, forwardlinks, shortest path, graph statistics (orphans, density, most connected)
|
|
39
|
+
- **File Watcher**: Incremental re-indexing on file changes with debounce
|
|
40
|
+
- **Local Embeddings**: No API key, no network after first model download
|
|
41
|
+
- **Zero Dependencies Beyond Node**: No Docker, no Python, no Obsidian, no GUI
|
|
42
|
+
|
|
43
|
+
---
|
|
44
|
+
|
|
45
|
+
## Quick Start
|
|
46
|
+
|
|
47
|
+
### 1. Installation Methods
|
|
48
|
+
|
|
49
|
+
#### Method A: NPX (No installation needed)
|
|
50
|
+
|
|
51
|
+
This lets you run the server without installing it permanently.
|
|
52
|
+
|
|
53
|
+
**Step 1**: Open your terminal in your project folder
|
|
54
|
+
|
|
55
|
+
**Step 2**: Run:
|
|
9
56
|
```bash
|
|
10
|
-
npx semantic-pages --notes ./vault
|
|
57
|
+
npx semantic-pages --notes ./vault --stats
|
|
11
58
|
```
|
|
12
59
|
|
|
13
|
-
|
|
60
|
+
**Step 3**: The first time you run it, NPX downloads the package and the embedding model (~80MB). This takes 1-2 minutes.
|
|
61
|
+
|
|
62
|
+
**Step 4**: After that, it runs instantly.
|
|
63
|
+
|
|
64
|
+
**Use this method when**: You want to try it out, or you're adding it to a project's `.mcp.json` config.
|
|
14
65
|
|
|
66
|
+
#### Method B: Global Installation (Recommended for regular use)
|
|
67
|
+
|
|
68
|
+
This installs the tool on your computer so you can use it in any project.
|
|
69
|
+
|
|
70
|
+
**Step 1**: Open your terminal
|
|
71
|
+
|
|
72
|
+
**Step 2**: Type this command and press Enter:
|
|
15
73
|
```bash
|
|
16
|
-
npm install -g semantic-pages
|
|
17
|
-
semantic-pages --notes ./vault
|
|
74
|
+
npm install -g @theglitchking/semantic-pages
|
|
18
75
|
```
|
|
19
76
|
|
|
20
|
-
|
|
77
|
+
**Step 3**: Test that it worked:
|
|
78
|
+
```bash
|
|
79
|
+
semantic-pages --version
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
**Step 4**: You should see a version number. If you do, it's installed correctly!
|
|
21
83
|
|
|
22
|
-
|
|
84
|
+
#### Method C: MCP Configuration (Recommended for Claude Code)
|
|
85
|
+
|
|
86
|
+
Add to your project's `.mcp.json` so Claude has automatic access:
|
|
23
87
|
|
|
24
88
|
```json
|
|
25
89
|
{
|
|
@@ -32,105 +96,344 @@ Add to your `.mcp.json` (Claude Code, Cursor, etc.):
|
|
|
32
96
|
|
|
33
97
|
Point `--notes` at any folder of `.md` files: `./vault`, `./docs`, `./notes`, or `.` for the whole repo.
|
|
34
98
|
|
|
35
|
-
|
|
99
|
+
**What to expect**: Next time you run `claude` in that project, Claude will have 21 new tools for searching, reading, writing, and traversing your notes.
|
|
36
100
|
|
|
37
|
-
|
|
101
|
+
#### Method D: Project Installation (For team projects)
|
|
38
102
|
|
|
39
|
-
|
|
103
|
+
This installs the tool only for one specific project.
|
|
40
104
|
|
|
41
|
-
**
|
|
105
|
+
**Step 1**: Open your terminal in your project folder
|
|
42
106
|
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
| `search_graph` | Graph traversal via wikilinks and tags |
|
|
48
|
-
| `search_hybrid` | Combined semantic + graph, re-ranked |
|
|
107
|
+
**Step 2**: Type this command:
|
|
108
|
+
```bash
|
|
109
|
+
npm install --save-dev @theglitchking/semantic-pages
|
|
110
|
+
```
|
|
49
111
|
|
|
50
|
-
**
|
|
112
|
+
**Step 3**: Add a script to your `package.json` file:
|
|
113
|
+
```json
|
|
114
|
+
{
|
|
115
|
+
"scripts": {
|
|
116
|
+
"notes": "semantic-pages --notes ./vault",
|
|
117
|
+
"notes:stats": "semantic-pages --notes ./vault --stats",
|
|
118
|
+
"notes:reindex": "semantic-pages --notes ./vault --reindex"
|
|
119
|
+
}
|
|
120
|
+
}
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
---
|
|
124
|
+
|
|
125
|
+
### 2. How to Use
|
|
126
|
+
|
|
127
|
+
#### CLI Commands
|
|
128
|
+
|
|
129
|
+
These commands run in your terminal and manage your notes index.
|
|
130
|
+
|
|
131
|
+
| Command | Description |
|
|
132
|
+
|---------|-------------|
|
|
133
|
+
| `semantic-pages --notes <path>` | Start MCP server (default mode) |
|
|
134
|
+
| `semantic-pages --notes <path> --stats` | Show vault statistics and exit |
|
|
135
|
+
| `semantic-pages --notes <path> --reindex` | Force full reindex and exit |
|
|
136
|
+
| `semantic-pages --notes <path> --no-watch` | Start server without file watcher |
|
|
137
|
+
| `semantic-pages tools` | List all 21 MCP tools with descriptions |
|
|
138
|
+
| `semantic-pages tools <name>` | Show arguments and examples for a specific tool |
|
|
139
|
+
| `semantic-pages --version` | Show version number |
|
|
140
|
+
| `semantic-pages --help` | Show all options |
|
|
141
|
+
|
|
142
|
+
##### Built-in Tool Help
|
|
143
|
+
|
|
144
|
+
Every MCP tool has built-in documentation accessible from the CLI:
|
|
145
|
+
|
|
146
|
+
```bash
|
|
147
|
+
# List all 21 tools organized by category
|
|
148
|
+
semantic-pages tools
|
|
149
|
+
```
|
|
150
|
+
|
|
151
|
+
```
|
|
152
|
+
Semantic Pages — 21 MCP Tools
|
|
153
|
+
|
|
154
|
+
Search:
|
|
155
|
+
search_semantic Vector similarity search — find notes by meaning, not just keywords
|
|
156
|
+
search_text Full-text keyword or regex search with optional filters
|
|
157
|
+
search_graph Graph traversal — find notes connected to a concept via wikilinks and tags
|
|
158
|
+
search_hybrid Combined semantic + graph search — vector results re-ranked by graph proximity
|
|
159
|
+
|
|
160
|
+
Read:
|
|
161
|
+
read_note Read the full content of a specific note by path
|
|
162
|
+
read_multiple_notes Batch read multiple notes in one call
|
|
163
|
+
list_notes List all indexed notes with metadata (title, tags, link count)
|
|
164
|
+
...
|
|
165
|
+
```
|
|
166
|
+
|
|
167
|
+
```bash
|
|
168
|
+
# Get detailed help for a specific tool — arguments, types, and examples
|
|
169
|
+
semantic-pages tools search_semantic
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
```
|
|
173
|
+
search_semantic
|
|
174
|
+
───────────────
|
|
175
|
+
Vector similarity search — find notes by meaning, not just keywords
|
|
176
|
+
|
|
177
|
+
Arguments:
|
|
178
|
+
{ "query": "string", "limit?": 10 }
|
|
179
|
+
|
|
180
|
+
Examples:
|
|
181
|
+
{ "query": "microservices architecture", "limit": 5 }
|
|
182
|
+
{ "query": "how to deploy to production" }
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
```bash
|
|
186
|
+
# More examples
|
|
187
|
+
semantic-pages tools update_note # See all 4 editing modes
|
|
188
|
+
semantic-pages tools move_note # See wikilink-aware rename
|
|
189
|
+
semantic-pages tools manage_tags # See add/remove/list actions
|
|
190
|
+
semantic-pages tools rename_tag # See vault-wide tag rename
|
|
191
|
+
```
|
|
192
|
+
|
|
193
|
+
##### Command Examples and Details
|
|
194
|
+
|
|
195
|
+
**`--stats` - Check your vault**
|
|
196
|
+
|
|
197
|
+
**How to use it**:
|
|
198
|
+
```bash
|
|
199
|
+
semantic-pages --notes ./vault --stats
|
|
200
|
+
```
|
|
201
|
+
|
|
202
|
+
**When to use it**: Quick check to see what's in your vault.
|
|
203
|
+
|
|
204
|
+
**What to expect**:
|
|
205
|
+
```
|
|
206
|
+
Notes: 47
|
|
207
|
+
Chunks: 312
|
|
208
|
+
Wikilinks: 89
|
|
209
|
+
Tags: 23 unique
|
|
210
|
+
```
|
|
211
|
+
|
|
212
|
+
---
|
|
213
|
+
|
|
214
|
+
**`--reindex` - Rebuild the index**
|
|
215
|
+
|
|
216
|
+
**How to use it**:
|
|
217
|
+
```bash
|
|
218
|
+
semantic-pages --notes ./vault --reindex
|
|
219
|
+
```
|
|
220
|
+
|
|
221
|
+
**When to use it**:
|
|
222
|
+
- After bulk-adding or modifying notes outside of the MCP tools
|
|
223
|
+
- If the index seems stale or corrupted
|
|
224
|
+
- After changing the embedding model
|
|
225
|
+
|
|
226
|
+
**What to expect**: Full re-parse, re-embed, and re-index of all markdown files. Takes 10-60 seconds depending on vault size and whether the model is cached.
|
|
227
|
+
|
|
228
|
+
---
|
|
229
|
+
|
|
230
|
+
#### MCP Tools
|
|
231
|
+
|
|
232
|
+
When the server is running (via `.mcp.json` or CLI), Claude has access to these 21 tools:
|
|
233
|
+
|
|
234
|
+
##### Search Tools
|
|
51
235
|
|
|
52
236
|
| Tool | Description |
|
|
53
237
|
|------|-------------|
|
|
54
|
-
| `
|
|
55
|
-
| `
|
|
56
|
-
| `
|
|
238
|
+
| `search_semantic` | Vector similarity search — "find notes similar to this idea" |
|
|
239
|
+
| `search_text` | Full-text keyword/regex search with path, tag, and case filters |
|
|
240
|
+
| `search_graph` | Graph traversal — "find notes connected to this concept" |
|
|
241
|
+
| `search_hybrid` | Combined — semantic results re-ranked by graph proximity |
|
|
242
|
+
|
|
243
|
+
**`search_semantic` - Find notes by meaning**
|
|
244
|
+
|
|
245
|
+
**When Claude uses it**: When you ask things like "find notes about deployment strategies" or "what have I written about authentication?"
|
|
246
|
+
|
|
247
|
+
**What to expect**: Returns notes ranked by semantic similarity to your query, with relevance scores and text snippets. Works even if the exact words don't appear in the notes.
|
|
248
|
+
|
|
249
|
+
**Example conversation**:
|
|
250
|
+
```
|
|
251
|
+
You: What notes do I have about scaling microservices?
|
|
252
|
+
Claude: [calls search_semantic with query "scaling microservices"]
|
|
253
|
+
Claude: I found 4 relevant notes:
|
|
254
|
+
1. architecture/scaling-patterns.md (0.87 similarity) — discusses horizontal vs vertical scaling
|
|
255
|
+
2. devops/kubernetes-autoscaling.md (0.82 similarity) — HPA and VPA configuration
|
|
256
|
+
3. architecture/service-mesh.md (0.71 similarity) — mentions scaling in the context of Istio
|
|
257
|
+
4. meeting-notes/2024-03-15.md (0.65 similarity) — team discussion about scaling concerns
|
|
258
|
+
```
|
|
259
|
+
|
|
260
|
+
---
|
|
57
261
|
|
|
58
|
-
**
|
|
262
|
+
**`search_text` - Find exact matches**
|
|
263
|
+
|
|
264
|
+
**When Claude uses it**: When you need exact keyword or regex matches, not semantic similarity.
|
|
265
|
+
|
|
266
|
+
**What to expect**: Returns notes containing the exact pattern, with snippets showing context. Supports:
|
|
267
|
+
- Case-sensitive/insensitive search
|
|
268
|
+
- Regex patterns
|
|
269
|
+
- Path glob filters (e.g., only search in `notes/`)
|
|
270
|
+
- Tag filters (e.g., only search notes tagged `#architecture`)
|
|
271
|
+
|
|
272
|
+
---
|
|
273
|
+
|
|
274
|
+
**`search_graph` - Traverse connections**
|
|
275
|
+
|
|
276
|
+
**When Claude uses it**: When you want to explore how notes are connected — "what's related to this concept?"
|
|
277
|
+
|
|
278
|
+
**What to expect**: Starting from notes matching your concept, does a breadth-first traversal through wikilinks and shared tags, returning all connected notes within the specified depth.
|
|
279
|
+
|
|
280
|
+
---
|
|
281
|
+
|
|
282
|
+
**`search_hybrid` - Best of both**
|
|
283
|
+
|
|
284
|
+
**When Claude uses it**: When you want comprehensive results — semantic matches boosted by graph proximity.
|
|
285
|
+
|
|
286
|
+
**What to expect**: Semantic search results re-ranked so that notes which are also graph-connected score higher. Best for "find everything relevant to X."
|
|
287
|
+
|
|
288
|
+
---
|
|
289
|
+
|
|
290
|
+
##### Read Tools
|
|
59
291
|
|
|
60
292
|
| Tool | Description |
|
|
61
293
|
|------|-------------|
|
|
62
|
-
| `
|
|
63
|
-
| `
|
|
64
|
-
| `
|
|
65
|
-
| `move_note` | Move/rename, updates wikilinks across vault |
|
|
294
|
+
| `read_note` | Read full content of a specific note |
|
|
295
|
+
| `read_multiple_notes` | Batch read multiple notes in one call |
|
|
296
|
+
| `list_notes` | List all indexed notes with metadata (title, tags, link count) |
|
|
66
297
|
|
|
67
|
-
|
|
298
|
+
---
|
|
299
|
+
|
|
300
|
+
##### Write Tools
|
|
68
301
|
|
|
69
302
|
| Tool | Description |
|
|
70
303
|
|------|-------------|
|
|
71
|
-
| `
|
|
72
|
-
| `
|
|
73
|
-
| `
|
|
74
|
-
| `
|
|
304
|
+
| `create_note` | Create a new markdown note with optional frontmatter |
|
|
305
|
+
| `update_note` | Edit note content (overwrite, append, prepend, or patch by heading) |
|
|
306
|
+
| `delete_note` | Delete a note (requires explicit confirmation) |
|
|
307
|
+
| `move_note` | Move/rename a note — automatically updates wikilinks across the vault |
|
|
308
|
+
|
|
309
|
+
**`update_note` - Four editing modes**
|
|
75
310
|
|
|
76
|
-
**
|
|
311
|
+
**Modes**:
|
|
312
|
+
- `overwrite` — replace entire content
|
|
313
|
+
- `append` — add to the end
|
|
314
|
+
- `prepend` — add after frontmatter, before existing content
|
|
315
|
+
- `patch-by-heading` — replace the content under a specific heading (preserves other sections)
|
|
316
|
+
|
|
317
|
+
**Example**:
|
|
318
|
+
```
|
|
319
|
+
You: Add a "Rollback" section to the deployment guide
|
|
320
|
+
Claude: [calls update_note with mode "patch-by-heading", heading "Rollback"]
|
|
321
|
+
Claude: Updated deployment-guide.md — added Rollback section with kubectl rollback instructions.
|
|
322
|
+
```
|
|
323
|
+
|
|
324
|
+
---
|
|
325
|
+
|
|
326
|
+
**`move_note` - Smart rename**
|
|
327
|
+
|
|
328
|
+
**What makes it special**: When you move `user-service.md` to `auth-service.md`, every `[[user-service]]` wikilink in every other note gets updated to `[[auth-service]]` automatically.
|
|
329
|
+
|
|
330
|
+
---
|
|
331
|
+
|
|
332
|
+
##### Metadata Tools
|
|
77
333
|
|
|
78
334
|
| Tool | Description |
|
|
79
335
|
|------|-------------|
|
|
80
|
-
| `
|
|
81
|
-
| `
|
|
82
|
-
| `
|
|
83
|
-
| `
|
|
336
|
+
| `get_frontmatter` | Read parsed YAML frontmatter as JSON |
|
|
337
|
+
| `update_frontmatter` | Set or delete frontmatter keys atomically (pass `null` to delete) |
|
|
338
|
+
| `manage_tags` | Add, remove, or list tags on a note (frontmatter + inline) |
|
|
339
|
+
| `rename_tag` | Rename a tag across all notes in the vault |
|
|
84
340
|
|
|
85
|
-
**
|
|
341
|
+
**`rename_tag` - Vault-wide tag rename**
|
|
342
|
+
|
|
343
|
+
**When Claude uses it**: When you want to rename `#architecture` to `#arch` everywhere — in frontmatter `tags:` arrays and inline `#tags` across every file.
|
|
344
|
+
|
|
345
|
+
**What to expect**: Returns the count of files modified.
|
|
346
|
+
|
|
347
|
+
---
|
|
348
|
+
|
|
349
|
+
##### Graph Tools
|
|
86
350
|
|
|
87
351
|
| Tool | Description |
|
|
88
352
|
|------|-------------|
|
|
89
|
-
| `
|
|
90
|
-
| `
|
|
353
|
+
| `backlinks` | All notes that link TO a given note via `[[wikilinks]]` |
|
|
354
|
+
| `forwardlinks` | All notes linked FROM a given note |
|
|
355
|
+
| `graph_path` | Shortest path between two notes in the knowledge graph |
|
|
356
|
+
| `graph_statistics` | Most connected nodes, orphan count, graph density |
|
|
91
357
|
|
|
92
|
-
|
|
358
|
+
**`graph_path` - Find connections between notes**
|
|
93
359
|
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
|
|
360
|
+
**Example conversation**:
|
|
361
|
+
```
|
|
362
|
+
You: How are the deployment guide and the user service connected?
|
|
363
|
+
Claude: [calls graph_path from "deployment-guide.md" to "user-service.md"]
|
|
364
|
+
Claude: Path: deployment-guide.md → microservices.md → user-service.md
|
|
365
|
+
The deployment guide links to the microservices overview, which links to the user service.
|
|
366
|
+
```
|
|
100
367
|
|
|
101
|
-
|
|
368
|
+
---
|
|
102
369
|
|
|
103
|
-
|
|
370
|
+
**`graph_statistics` - Vault health overview**
|
|
104
371
|
|
|
105
|
-
|
|
106
|
-
|
|
107
|
-
|
|
372
|
+
**What to expect**:
|
|
373
|
+
```json
|
|
374
|
+
{
|
|
375
|
+
"totalNodes": 47,
|
|
376
|
+
"totalEdges": 89,
|
|
377
|
+
"orphanCount": 3,
|
|
378
|
+
"mostConnected": [
|
|
379
|
+
{ "path": "project-overview.md", "connections": 12 },
|
|
380
|
+
{ "path": "microservices.md", "connections": 9 }
|
|
381
|
+
],
|
|
382
|
+
"density": 0.04
|
|
383
|
+
}
|
|
384
|
+
```
|
|
108
385
|
|
|
109
|
-
|
|
110
|
-
semantic-pages --notes ./vault --stats
|
|
386
|
+
---
|
|
111
387
|
|
|
112
|
-
|
|
113
|
-
semantic-pages --notes ./vault --reindex
|
|
388
|
+
##### System Tools
|
|
114
389
|
|
|
115
|
-
|
|
116
|
-
|
|
390
|
+
| Tool | Description |
|
|
391
|
+
|------|-------------|
|
|
392
|
+
| `get_stats` | Vault stats — total notes, chunks, embeddings, graph density, model info |
|
|
393
|
+
| `reindex` | Force full reindex of the vault |
|
|
394
|
+
|
|
395
|
+
---
|
|
396
|
+
|
|
397
|
+
## Common Workflows
|
|
398
|
+
|
|
399
|
+
### Quick Vault Check (10 seconds)
|
|
400
|
+
```bash
|
|
401
|
+
semantic-pages --notes ./vault --stats
|
|
117
402
|
```
|
|
118
403
|
|
|
119
|
-
|
|
404
|
+
### Adding Semantic Pages to a Project (2 minutes)
|
|
405
|
+
```bash
|
|
406
|
+
# Step 1: Create .mcp.json in your project root
|
|
407
|
+
echo '{
|
|
408
|
+
"semantic-pages": {
|
|
409
|
+
"command": "npx",
|
|
410
|
+
"args": ["-y", "semantic-pages", "--notes", "./notes"]
|
|
411
|
+
}
|
|
412
|
+
}' > .mcp.json
|
|
120
413
|
|
|
121
|
-
|
|
122
|
-
|
|
414
|
+
# Step 2: Add index to .gitignore
|
|
415
|
+
echo ".semantic-pages-index/" >> .gitignore
|
|
123
416
|
|
|
124
|
-
|
|
125
|
-
|
|
417
|
+
# Step 3: Start Claude — it now has 21 note tools
|
|
418
|
+
claude
|
|
419
|
+
```
|
|
126
420
|
|
|
127
|
-
|
|
128
|
-
await embedder.init();
|
|
129
|
-
const vec = await embedder.embed("search query");
|
|
421
|
+
### Asking Claude About Your Notes
|
|
130
422
|
```
|
|
423
|
+
You: What have I written about authentication?
|
|
424
|
+
Claude: [calls search_semantic] I found 3 notes about authentication...
|
|
425
|
+
|
|
426
|
+
You: What links to the API gateway doc?
|
|
427
|
+
Claude: [calls backlinks] 4 notes link to api-gateway.md...
|
|
131
428
|
|
|
132
|
-
|
|
429
|
+
You: Create a new note summarizing today's meeting
|
|
430
|
+
Claude: [calls create_note] Created meeting-2024-03-15.md with frontmatter...
|
|
431
|
+
|
|
432
|
+
You: Rename the #backend tag to #server across all notes
|
|
433
|
+
Claude: [calls rename_tag] Renamed #backend to #server in 12 files.
|
|
434
|
+
```
|
|
133
435
|
|
|
436
|
+
### Per-Repo Pattern
|
|
134
437
|
```
|
|
135
438
|
any-repo/
|
|
136
439
|
├── notes/ # your markdown files
|
|
@@ -141,10 +444,257 @@ any-repo/
|
|
|
141
444
|
|
|
142
445
|
Each repo gets its own independent knowledge base. No shared state between projects.
|
|
143
446
|
|
|
447
|
+
---
|
|
448
|
+
|
|
449
|
+
## Technical Details
|
|
450
|
+
|
|
451
|
+
### Architecture Overview
|
|
452
|
+
|
|
453
|
+
Semantic Pages is built with TypeScript and organized into a core library with thin transport layers:
|
|
454
|
+
|
|
455
|
+
```
|
|
456
|
+
src/
|
|
457
|
+
├── core/ # Pure library — no transport assumptions
|
|
458
|
+
│ ├── index.ts # Core exports
|
|
459
|
+
│ ├── types.ts # Shared type definitions
|
|
460
|
+
│ ├── indexer.ts # Markdown parser (unified + remark)
|
|
461
|
+
│ ├── embedder.ts # Local embedding model (@huggingface/transformers)
|
|
462
|
+
│ ├── graph.ts # Knowledge graph (graphology)
|
|
463
|
+
│ ├── vector.ts # HNSW vector index (hnswlib-node)
|
|
464
|
+
│ ├── search-text.ts # Full-text / regex search
|
|
465
|
+
│ ├── crud.ts # Create/update/delete/move notes
|
|
466
|
+
│ ├── frontmatter.ts # Frontmatter + tag management
|
|
467
|
+
│ └── watcher.ts # File watcher (chokidar)
|
|
468
|
+
│
|
|
469
|
+
├── mcp/ # MCP stdio server (thin wrapper over core)
|
|
470
|
+
│ └── server.ts # Server setup + 21 tool definitions
|
|
471
|
+
│
|
|
472
|
+
└── cli/ # CLI entrypoint
|
|
473
|
+
└── index.ts # commander-based CLI
|
|
474
|
+
```
|
|
475
|
+
|
|
476
|
+
### Tech Stack
|
|
477
|
+
|
|
478
|
+
| Concern | Package | Why |
|
|
479
|
+
|---------|---------|-----|
|
|
480
|
+
| Markdown parsing | `unified` + `remark-parse` | AST-based, handles wikilinks |
|
|
481
|
+
| Frontmatter | `gray-matter` | YAML/TOML frontmatter extraction |
|
|
482
|
+
| Wikilinks | `remark-wiki-link` | `[[note-name]]` extraction from AST |
|
|
483
|
+
| Embeddings | `@huggingface/transformers` | WASM runtime, no Python, no API key |
|
|
484
|
+
| Embedding model | `nomic-embed-text-v1.5` | High quality, ~80MB, runs locally |
|
|
485
|
+
| Vector index | `hnswlib-node` | HNSW algorithm, same as production vector DBs |
|
|
486
|
+
| Knowledge graph | `graphology` | Directed graph, serializable, rich algorithms |
|
|
487
|
+
| Graph algorithms | `graphology-traversal` + `graphology-shortest-path` | BFS, shortest path |
|
|
488
|
+
| File watching | `chokidar` | Cross-platform, debounced |
|
|
489
|
+
| MCP server | `@modelcontextprotocol/sdk` | Official MCP TypeScript SDK |
|
|
490
|
+
| CLI | `commander` | Standard Node.js CLI framework |
|
|
491
|
+
|
|
492
|
+
### Index Layout
|
|
493
|
+
|
|
494
|
+
```
|
|
495
|
+
.semantic-pages-index/ # gitignored, rebuilt on demand
|
|
496
|
+
├── embeddings.json # serialized chunk vectors
|
|
497
|
+
├── hnsw.bin # HNSW vector index
|
|
498
|
+
├── hnsw-meta.json # chunk → document mapping
|
|
499
|
+
├── graph.json # knowledge graph (graphology format)
|
|
500
|
+
└── meta.json # index metadata (vault path, model, timestamp)
|
|
501
|
+
```
|
|
502
|
+
|
|
503
|
+
### Document Processing Pipeline
|
|
504
|
+
|
|
505
|
+
#### Step 1: Parse
|
|
506
|
+
```
|
|
507
|
+
.md file → gray-matter (frontmatter) → remark (AST) → extract:
|
|
508
|
+
- title (frontmatter > first heading > filename)
|
|
509
|
+
- wikilinks ([[note-name]])
|
|
510
|
+
- tags (frontmatter tags: + inline #tags)
|
|
511
|
+
- headers (H1-H6)
|
|
512
|
+
- plain text (markdown stripped)
|
|
513
|
+
```
|
|
514
|
+
|
|
515
|
+
#### Step 2: Chunk
|
|
516
|
+
```
|
|
517
|
+
Plain text → split at sentence boundaries → ~512 token chunks
|
|
518
|
+
```
|
|
519
|
+
|
|
520
|
+
#### Step 3: Embed
|
|
521
|
+
```
|
|
522
|
+
Each chunk → nomic-embed-text-v1.5 (WASM) → normalized Float32Array
|
|
523
|
+
```
|
|
524
|
+
|
|
525
|
+
#### Step 4: Index
|
|
526
|
+
```
|
|
527
|
+
Embeddings → HNSW index (hnswlib-node)
|
|
528
|
+
Wikilinks + tags → directed graph (graphology)
|
|
529
|
+
```
|
|
530
|
+
|
|
531
|
+
#### Step 5: Serve
|
|
532
|
+
```
|
|
533
|
+
MCP tools → query embeddings / graph / files → return results
|
|
534
|
+
```
|
|
535
|
+
|
|
536
|
+
### Using as a Library
|
|
537
|
+
|
|
538
|
+
The core library is importable independently of the MCP server:
|
|
539
|
+
|
|
540
|
+
```typescript
|
|
541
|
+
import { Indexer, Embedder, GraphBuilder, VectorIndex, TextSearch } from "@theglitchking/semantic-pages";
|
|
542
|
+
|
|
543
|
+
// Index all notes
|
|
544
|
+
const indexer = new Indexer("./vault");
|
|
545
|
+
const docs = await indexer.indexAll();
|
|
546
|
+
|
|
547
|
+
// Build embeddings
|
|
548
|
+
const embedder = new Embedder();
|
|
549
|
+
await embedder.init();
|
|
550
|
+
const chunks = docs.flatMap(d => d.chunks);
|
|
551
|
+
const vecs = await embedder.embedBatch(chunks);
|
|
552
|
+
|
|
553
|
+
// Build vector index
|
|
554
|
+
const vectorIndex = new VectorIndex(embedder.getDimensions());
|
|
555
|
+
vectorIndex.build(vecs, chunks.map((text, i) => ({
|
|
556
|
+
docPath: docs[Math.floor(i / docs.length)].path,
|
|
557
|
+
chunkIndex: i,
|
|
558
|
+
text
|
|
559
|
+
})));
|
|
560
|
+
|
|
561
|
+
// Search
|
|
562
|
+
const queryVec = await embedder.embed("microservices architecture");
|
|
563
|
+
const results = vectorIndex.search(queryVec, 5);
|
|
564
|
+
|
|
565
|
+
// Build knowledge graph
|
|
566
|
+
const graph = new GraphBuilder();
|
|
567
|
+
graph.buildFromDocuments(docs);
|
|
568
|
+
const backlinks = graph.backlinks("project-overview.md");
|
|
569
|
+
const path = graph.findPath("overview.md", "auth.md");
|
|
570
|
+
```
|
|
571
|
+
|
|
572
|
+
### Performance
|
|
573
|
+
|
|
574
|
+
| Metric | Value |
|
|
575
|
+
|--------|-------|
|
|
576
|
+
| Index 100 notes | ~5 seconds |
|
|
577
|
+
| Index 1,000 notes | ~30 seconds |
|
|
578
|
+
| Semantic search latency | <100ms |
|
|
579
|
+
| Text search latency | <10ms |
|
|
580
|
+
| Graph traversal latency | <5ms |
|
|
581
|
+
| Model download (first run) | ~80MB, cached at `~/.semantic-pages/models/` |
|
|
582
|
+
| Index size (100 notes) | ~10MB |
|
|
583
|
+
| npm package size | 85.7 kB |
|
|
584
|
+
|
|
585
|
+
---
|
|
586
|
+
|
|
144
587
|
## Requirements
|
|
145
588
|
|
|
146
|
-
- Node.js
|
|
589
|
+
- **Node.js**: Version 18.0.0 or higher
|
|
590
|
+
- **Operating System**: Linux, macOS, or Windows (with WSL2)
|
|
591
|
+
- **Disk Space**: ~80MB for the embedding model (downloaded once)
|
|
592
|
+
|
|
593
|
+
---
|
|
594
|
+
|
|
595
|
+
## Troubleshooting
|
|
596
|
+
|
|
597
|
+
### Installation Issues
|
|
598
|
+
|
|
599
|
+
**Problem**: `npx semantic-pages` fails or shows "not found"
|
|
600
|
+
|
|
601
|
+
**Solution**:
|
|
602
|
+
```bash
|
|
603
|
+
# Clear npx cache and retry
|
|
604
|
+
npx --yes semantic-pages --notes ./vault --stats
|
|
605
|
+
|
|
606
|
+
# Or install globally
|
|
607
|
+
npm install -g @theglitchking/semantic-pages
|
|
608
|
+
```
|
|
609
|
+
|
|
610
|
+
**Problem**: Model download fails
|
|
611
|
+
|
|
612
|
+
**Solution**:
|
|
613
|
+
```bash
|
|
614
|
+
# Check internet connection, then retry
|
|
615
|
+
# The model is cached at ~/.semantic-pages/models/
|
|
616
|
+
# Delete and re-download if corrupted:
|
|
617
|
+
rm -rf ~/.semantic-pages/models/
|
|
618
|
+
semantic-pages --notes ./vault --reindex
|
|
619
|
+
```
|
|
620
|
+
|
|
621
|
+
### Usage Issues
|
|
622
|
+
|
|
623
|
+
**Problem**: Search returns no results
|
|
624
|
+
|
|
625
|
+
**Solution**:
|
|
626
|
+
```bash
|
|
627
|
+
# Force reindex
|
|
628
|
+
semantic-pages --notes ./vault --reindex
|
|
629
|
+
|
|
630
|
+
# Check that .md files exist in the path
|
|
631
|
+
ls ./vault/*.md
|
|
632
|
+
```
|
|
633
|
+
|
|
634
|
+
**Problem**: Index seems stale after editing files externally
|
|
635
|
+
|
|
636
|
+
**Solution**: The file watcher should catch changes, but if it misses some:
|
|
637
|
+
```bash
|
|
638
|
+
# Force reindex
|
|
639
|
+
semantic-pages --notes ./vault --reindex
|
|
640
|
+
```
|
|
641
|
+
|
|
642
|
+
**Problem**: `hnswlib-node` fails to install (native addon)
|
|
643
|
+
|
|
644
|
+
**Solution**:
|
|
645
|
+
```bash
|
|
646
|
+
# Install build tools
|
|
647
|
+
# On Ubuntu/Debian:
|
|
648
|
+
sudo apt install build-essential python3
|
|
649
|
+
|
|
650
|
+
# On macOS:
|
|
651
|
+
xcode-select --install
|
|
652
|
+
|
|
653
|
+
# Then retry
|
|
654
|
+
npm install -g @theglitchking/semantic-pages
|
|
655
|
+
```
|
|
656
|
+
|
|
657
|
+
---
|
|
658
|
+
|
|
659
|
+
## Contributing
|
|
660
|
+
|
|
661
|
+
Contributions are welcome! The project uses:
|
|
662
|
+
- **TypeScript** with strict mode
|
|
663
|
+
- **tsup** for bundling (ESM)
|
|
664
|
+
- **vitest** for testing (123 tests across 11 suites)
|
|
665
|
+
|
|
666
|
+
```bash
|
|
667
|
+
# Clone and install
|
|
668
|
+
git clone https://github.com/TheGlitchKing/semantic-pages.git
|
|
669
|
+
cd semantic-pages
|
|
670
|
+
npm install
|
|
671
|
+
|
|
672
|
+
# Run tests
|
|
673
|
+
npm test
|
|
674
|
+
|
|
675
|
+
# Build
|
|
676
|
+
npm run build
|
|
677
|
+
|
|
678
|
+
# Type check
|
|
679
|
+
npm run lint
|
|
680
|
+
```
|
|
681
|
+
|
|
682
|
+
---
|
|
147
683
|
|
|
148
684
|
## License
|
|
149
685
|
|
|
150
|
-
MIT
|
|
686
|
+
MIT License - see [LICENSE](./LICENSE) file for details.
|
|
687
|
+
|
|
688
|
+
---
|
|
689
|
+
|
|
690
|
+
## Support
|
|
691
|
+
|
|
692
|
+
- **GitHub Issues**: [Report bugs or request features](https://github.com/TheGlitchKing/semantic-pages/issues)
|
|
693
|
+
- **NPM Package**: [@theglitchking/semantic-pages](https://www.npmjs.com/package/@theglitchking/semantic-pages)
|
|
694
|
+
- **Marketplace**: [Glitch Kingdom of Plugins](https://github.com/TheGlitchKing/glitch-kingdom-of-plugins)
|
|
695
|
+
|
|
696
|
+
---
|
|
697
|
+
|
|
698
|
+
**Made with care by TheGlitchKing**
|
|
699
|
+
|
|
700
|
+
[NPM](https://www.npmjs.com/package/@theglitchking/semantic-pages) | [GitHub](https://github.com/TheGlitchKing/semantic-pages) | [Issues](https://github.com/TheGlitchKing/semantic-pages/issues)
|