tokenos 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (111) hide show
  1. package/README.md +571 -0
  2. package/USAGE.md +451 -0
  3. package/dist/config.d.ts +22 -0
  4. package/dist/config.d.ts.map +1 -0
  5. package/dist/config.js +60 -0
  6. package/dist/config.js.map +1 -0
  7. package/dist/db/connection.d.ts +3 -0
  8. package/dist/db/connection.d.ts.map +1 -0
  9. package/dist/db/connection.js +78 -0
  10. package/dist/db/connection.js.map +1 -0
  11. package/dist/db/index.d.ts +4 -0
  12. package/dist/db/index.d.ts.map +1 -0
  13. package/dist/db/index.js +4 -0
  14. package/dist/db/index.js.map +1 -0
  15. package/dist/db/memory.d.ts +6 -0
  16. package/dist/db/memory.d.ts.map +1 -0
  17. package/dist/db/memory.js +62 -0
  18. package/dist/db/memory.js.map +1 -0
  19. package/dist/db/queries.d.ts +29 -0
  20. package/dist/db/queries.d.ts.map +1 -0
  21. package/dist/db/queries.js +215 -0
  22. package/dist/db/queries.js.map +1 -0
  23. package/dist/embeddings/client.d.ts +16 -0
  24. package/dist/embeddings/client.d.ts.map +1 -0
  25. package/dist/embeddings/client.js +70 -0
  26. package/dist/embeddings/client.js.map +1 -0
  27. package/dist/embeddings/index.d.ts +11 -0
  28. package/dist/embeddings/index.d.ts.map +1 -0
  29. package/dist/embeddings/index.js +37 -0
  30. package/dist/embeddings/index.js.map +1 -0
  31. package/dist/embeddings/similarity.d.ts +7 -0
  32. package/dist/embeddings/similarity.d.ts.map +1 -0
  33. package/dist/embeddings/similarity.js +31 -0
  34. package/dist/embeddings/similarity.js.map +1 -0
  35. package/dist/indexer/cli.d.ts +8 -0
  36. package/dist/indexer/cli.d.ts.map +1 -0
  37. package/dist/indexer/cli.js +21 -0
  38. package/dist/indexer/cli.js.map +1 -0
  39. package/dist/indexer/ignore.d.ts +4 -0
  40. package/dist/indexer/ignore.d.ts.map +1 -0
  41. package/dist/indexer/ignore.js +30 -0
  42. package/dist/indexer/ignore.js.map +1 -0
  43. package/dist/indexer/index.d.ts +5 -0
  44. package/dist/indexer/index.d.ts.map +1 -0
  45. package/dist/indexer/index.js +4 -0
  46. package/dist/indexer/index.js.map +1 -0
  47. package/dist/indexer/indexer.d.ts +13 -0
  48. package/dist/indexer/indexer.d.ts.map +1 -0
  49. package/dist/indexer/indexer.js +125 -0
  50. package/dist/indexer/indexer.js.map +1 -0
  51. package/dist/indexer/parser.d.ts +10 -0
  52. package/dist/indexer/parser.d.ts.map +1 -0
  53. package/dist/indexer/parser.js +444 -0
  54. package/dist/indexer/parser.js.map +1 -0
  55. package/dist/indexer/watcher.d.ts +7 -0
  56. package/dist/indexer/watcher.d.ts.map +1 -0
  57. package/dist/indexer/watcher.js +64 -0
  58. package/dist/indexer/watcher.js.map +1 -0
  59. package/dist/main.d.ts +3 -0
  60. package/dist/main.d.ts.map +1 -0
  61. package/dist/main.js +92 -0
  62. package/dist/main.js.map +1 -0
  63. package/dist/reset.d.ts +6 -0
  64. package/dist/reset.d.ts.map +1 -0
  65. package/dist/reset.js +23 -0
  66. package/dist/reset.js.map +1 -0
  67. package/dist/server/index.d.ts +2 -0
  68. package/dist/server/index.d.ts.map +1 -0
  69. package/dist/server/index.js +2 -0
  70. package/dist/server/index.js.map +1 -0
  71. package/dist/server/server.d.ts +4 -0
  72. package/dist/server/server.d.ts.map +1 -0
  73. package/dist/server/server.js +558 -0
  74. package/dist/server/server.js.map +1 -0
  75. package/dist/server/visualize.d.ts +2 -0
  76. package/dist/server/visualize.d.ts.map +1 -0
  77. package/dist/server/visualize.js +299 -0
  78. package/dist/server/visualize.js.map +1 -0
  79. package/dist/test-phase1.d.ts +13 -0
  80. package/dist/test-phase1.d.ts.map +1 -0
  81. package/dist/test-phase1.js +90 -0
  82. package/dist/test-phase1.js.map +1 -0
  83. package/dist/test-phase2.d.ts +13 -0
  84. package/dist/test-phase2.d.ts.map +1 -0
  85. package/dist/test-phase2.js +110 -0
  86. package/dist/test-phase2.js.map +1 -0
  87. package/dist/test-phase3.d.ts +12 -0
  88. package/dist/test-phase3.d.ts.map +1 -0
  89. package/dist/test-phase3.js +85 -0
  90. package/dist/test-phase3.js.map +1 -0
  91. package/dist/types.d.ts +73 -0
  92. package/dist/types.d.ts.map +1 -0
  93. package/dist/types.js +3 -0
  94. package/dist/types.js.map +1 -0
  95. package/dist/utils/cache.d.ts +12 -0
  96. package/dist/utils/cache.d.ts.map +1 -0
  97. package/dist/utils/cache.js +45 -0
  98. package/dist/utils/cache.js.map +1 -0
  99. package/dist/utils/logger.d.ts +16 -0
  100. package/dist/utils/logger.d.ts.map +1 -0
  101. package/dist/utils/logger.js +52 -0
  102. package/dist/utils/logger.js.map +1 -0
  103. package/dist/utils/scoring.d.ts +15 -0
  104. package/dist/utils/scoring.d.ts.map +1 -0
  105. package/dist/utils/scoring.js +17 -0
  106. package/dist/utils/scoring.js.map +1 -0
  107. package/dist/verify-parser.d.ts +6 -0
  108. package/dist/verify-parser.d.ts.map +1 -0
  109. package/dist/verify-parser.js +105 -0
  110. package/dist/verify-parser.js.map +1 -0
  111. package/package.json +52 -0
package/README.md ADDED
@@ -0,0 +1,571 @@
1
+ # TokenOS
2
+
3
+ > **Local-first codebase graph intelligence for AI assistants — powered by SQLite, ts-morph, and Ollama.**
4
+
5
+ `TokenOS` is a [Model Context Protocol (MCP)](https://modelcontextprotocol.io/) server that statically analyses your TypeScript/TSX codebase, stores it as a structural dependency graph in SQLite, optionally enriches nodes with semantic embeddings via Ollama, and exposes high-precision query tools for AI coding assistants like Claude, Cursor, or any MCP-compatible client.
6
+
7
+ **The goal**: When you start a new chat, the AI already knows your codebase structure. No more "let me analyze all files first" — it queries the graph and gets exactly what it needs, saving tokens and compute.
8
+
9
+ ---
10
+
11
+ ## Table of Contents
12
+
13
+ - [Quick Start](#quick-start)
14
+ - [Commands](#commands)
15
+ - [Database Location](#database-location)
16
+ - [MCP Client Configuration](#mcp-client-configuration)
17
+ - [MCP Tools](#mcp-tools)
18
+ - [Node Types](#node-types)
19
+ - [Edge Types](#edge-types)
20
+ - [Importance Scoring](#importance-scoring)
21
+ - [Semantic Meta Enrichment](#semantic-meta-enrichment)
22
+ - [Conversation Memory](#conversation-memory)
23
+ - [Visualization Dashboard](#visualization-dashboard)
24
+ - [Changing the Embedding Model](#changing-the-embedding-model)
25
+ - [Architecture](#architecture)
26
+ - [Configuration Reference](#configuration-reference)
27
+ - [Prerequisites](#prerequisites)
28
+ - [Tech Stack](#tech-stack)
29
+ - [Limitations](#limitations)
30
+ - [License](#license)
31
+
32
+ ---
33
+
34
+ ## Quick Start
35
+
36
+ ### 1. Install
37
+
38
+ ```bash
39
+ git clone https://github.com/wripcode/TokenOS.git
40
+ cd TokenOS
41
+ npm install
42
+ ```
43
+
44
+ ### 2. Configure
45
+
46
+ Edit `tokenos.config.json` in the project root:
47
+
48
+ ```json
49
+ {
50
+ "watchPath": "/absolute/path/to/your/project",
51
+ "ollama": {
52
+ "url": "http://localhost:11434",
53
+ "model": "mxbai-embed-large:latest"
54
+ },
55
+ "ui": {
56
+ "enabled": false,
57
+ "port": 3333
58
+ }
59
+ }
60
+ ```
61
+
62
+ ### 3. Run
63
+
64
+ ```bash
65
+ npm run dev
66
+ ```
67
+
68
+ That's it. The server will:
69
+
70
+ 1. Parse all `.ts`/`.tsx` files via ts-morph AST analysis
71
+ 2. Extract nodes (functions, classes, components, interfaces, types, enums, routes, variables, imports)
72
+ 3. Extract edges (CALLS, IMPORTS, EXPORTS, EXTENDS, IMPLEMENTS, DEFINES, RENDERS, CONTAINS, TYPE_OF, PART_OF_TAB)
73
+ 4. Store everything in a per-project SQLite database
74
+ 5. Auto-generate summaries for all nodes
75
+ 6. Back-fill semantic embeddings via Ollama (if running)
76
+ 7. Compute importance scores for architectural ranking
77
+ 8. Start a chokidar file watcher for real-time incremental updates
78
+ 9. Serve 6 MCP tools over stdio transport
79
+ 10. Optionally launch an interactive graph visualization dashboard
80
+
81
+ ---
82
+
83
+ ## Commands
84
+
85
+ | Command | Description |
86
+ |---|---|
87
+ | `npm run dev` | Start the MCP server (reads `tokenos.config.json`) |
88
+ | `npm run reset` | Delete the project database — next `npm run dev` re-indexes from scratch |
89
+ | `npm run build` | Compile TypeScript to `dist/` |
90
+ | `npm run index -- /path` | One-shot indexing of a directory (no server, no watcher) |
91
+
92
+ ---
93
+
94
+ ## Database Location
95
+
96
+ Each project gets its own isolated database at:
97
+
98
+ ```
99
+ <your-project>/.tokenos/graph.db
100
+ ```
101
+
102
+ This means:
103
+ - ✅ Different projects never mix data
104
+ - ✅ You can delete `.tokenos/` to reset a specific project
105
+ - ✅ Add `.tokenos/` to your project's `.gitignore`
106
+
107
+ SQLite is configured with **WAL mode** for better concurrent read performance and foreign key enforcement.
108
+
109
+ ### Schema
110
+
111
+ Three tables are created automatically:
112
+
113
+ | Table | Purpose |
114
+ |---|---|
115
+ | `nodes` | All code entities (functions, classes, components, etc.) with metadata, summaries, embeddings, and importance scores |
116
+ | `edges` | All relationships between nodes (CALLS, IMPORTS, RENDERS, etc.) with unique constraint on `(from_node, to_node, type)` |
117
+ | `memories` | Conversation memory storage for persistent context across sessions |
118
+
119
+ ---
120
+
121
+ ## MCP Client Configuration
122
+
123
+ ### Claude Desktop
124
+
125
+ Add to `claude_desktop_config.json`:
126
+
127
+ ```json
128
+ {
129
+ "mcpServers": {
130
+ "tokenos": {
131
+ "command": "node",
132
+ "args": ["/absolute/path/to/TokenOS/dist/main.js"]
133
+ }
134
+ }
135
+ }
136
+ ```
137
+
138
+ ### Development mode (tsx)
139
+
140
+ ```json
141
+ {
142
+ "mcpServers": {
143
+ "tokenos": {
144
+ "command": "npx",
145
+ "args": ["tsx", "/absolute/path/to/TokenOS/src/main.ts"]
146
+ }
147
+ }
148
+ }
149
+ ```
150
+
151
+ > **Note:** No need to pass the project path as CLI arg — it reads from `tokenos.config.json`.
152
+
153
+ ---
154
+
155
+ ## MCP Tools
156
+
157
+ All tools are **read-only**, **idempotent**, and communicate over **stdio** transport. Responses are truncated at **25,000 characters** to prevent overwhelming the LLM context window.
158
+
159
+ ### `search`
160
+
161
+ Smart search that understands intent and returns the most relevant code and context.
162
+
163
+ ```
164
+ Args:
165
+ query (string) — Natural language question or search term
166
+ response_format (optional) — 'json' (default) or 'markdown'
167
+ ```
168
+
169
+ **How it works:**
170
+
171
+ 1. **Intent detection** — Classifies query into one of four modes:
172
+ - `semantic` (default) — general concept/code search
173
+ - `trace` — triggered by "trace", "flow", "how", "why" → deeper BFS traversal (depth 2)
174
+ - `explore` — triggered by "where", "what", "find" → broad shallow search
175
+ - `dependency` — triggered by "depend", "import", "export" → import/export edges only
176
+ 2. **Hybrid search** — Combines text name/meta matching with Ollama semantic similarity
177
+ 3. **Graph expansion** — BFS-expands top results into a contextualized subgraph
178
+ 4. **Memory retrieval** — Appends relevant conversation memories (top 3)
179
+
180
+ **Returns:** Compressed, relevant context (code + relationships + memory)
181
+
182
+ ### `find_nodes`
183
+
184
+ Find code elements by name, type, or meaning.
185
+
186
+ ```
187
+ Args:
188
+ query (string) — Function name, class name, or natural-language description
189
+ type (optional) — 'function' | 'class' | 'file' | 'import' | 'variable' | 'component' | 'interface' | 'type_alias' | 'enum' | 'route'
190
+ mode (optional) — 'text' (default) or 'semantic' (requires Ollama)
191
+ limit (optional) — 1–50, default 10
192
+ offset (optional) — for pagination
193
+ response_format (optional) — 'json' (default) or 'markdown'
194
+ ```
195
+
196
+ **Text mode**: Searches by name (LIKE match) and meta fields (role, tab, feature). When `type` is provided, results are filtered with AND logic.
197
+
198
+ **Semantic mode**: Uses Ollama embeddings for concept-level search. Example: searching "authentication handler" finds `loginUser()` even if "auth" isn't in the name. Falls back to text mode when Ollama is offline.
199
+
200
+ **Returns:** List of matching nodes with relevance ranking
201
+
202
+ ### `get_node`
203
+
204
+ Get full details of a specific code element.
205
+
206
+ ```
207
+ Args:
208
+ id (string) — format: 'filePath::name' (e.g. 'src/utils/cache.ts::LRUCache')
209
+ response_format (optional) — 'json' or 'markdown'
210
+ ```
211
+
212
+ **Returns:** Complete node data (code, type, file, importance)
213
+
214
+ ### `get_connections`
215
+
216
+ Get directly related code elements.
217
+
218
+ ```
219
+ Args:
220
+ id (string) — Node ID
221
+ response_format (optional) — 'json' or 'markdown'
222
+ ```
223
+
224
+ **Returns:** Connected nodes and their relationships
225
+
226
+ ### `explore`
227
+
228
+ Explore surrounding code context from a starting point.
229
+
230
+ ```
231
+ Args:
232
+ id (string) — Starting node ID
233
+ depth (optional) — 1–3, default 2
234
+ ```
235
+
236
+ **Returns:** Local graph (nodes + relationships)
237
+
238
+ ### `top_nodes`
239
+
240
+ Get the most important parts of the codebase.
241
+
242
+ ```
243
+ Args:
244
+ limit (optional) — 1–100, default 20
245
+ response_format (optional) — 'json' or 'markdown'
246
+ ```
247
+
248
+ **Returns:** Ranked list of high-impact nodes
249
+
250
+ ---
251
+
252
+ ## Node Types
253
+
254
+ The parser extracts **10 distinct node types** from TypeScript/TSX source files:
255
+
256
+ | Type | Description | Detection Method |
257
+ |---|---|---|
258
+ | `function` | Named functions and arrow functions | `FunctionDeclaration` and `VariableDeclaration → ArrowFunction` |
259
+ | `component` | React/JSX components | PascalCase function that contains JSX elements |
260
+ | `class` | ES6 class declarations | `ClassDeclaration` |
261
+ | `interface` | TypeScript interface declarations | `InterfaceDeclaration` |
262
+ | `type_alias` | TypeScript type aliases | `TypeAliasDeclaration` |
263
+ | `enum` | TypeScript enum declarations | `EnumDeclaration` |
264
+ | `variable` | Exported constants and variables | Exported `VariableStatement` (excluding arrow functions already captured) |
265
+ | `import` | Import declarations | `ImportDeclaration` → keyed as `import:<module_specifier>` |
266
+ | `file` | Source file entry | One per file, named after the basename |
267
+ | `route` | Next.js App Router route | Detected from `app/**/page.tsx` file paths |
268
+
269
+ ---
270
+
271
+ ## Edge Types
272
+
273
+ The parser extracts **10 distinct edge types** representing relationships between nodes:
274
+
275
+ | Edge | Description |
276
+ |---|---|
277
+ | `CALLS` | Function A calls function B (via `CallExpression`) |
278
+ | `IMPORTS` | File imports a module |
279
+ | `EXPORTS` | File exports a symbol |
280
+ | `EXTENDS` | Class or interface extends a base |
281
+ | `IMPLEMENTS` | Class implements an interface |
282
+ | `DEFINES` | File defines a symbol (function, class, variable, etc.) |
283
+ | `RENDERS` | React component renders another component (JSX usage) |
284
+ | `CONTAINS` | Wrapper component contains a child in JSX tree (direct parent-child) |
285
+ | `TYPE_OF` | Symbol references a type or interface |
286
+ | `PART_OF_TAB` | Component belongs to a tab (detected from `TabsContent`, `TabContent`, `TabPanel` wrappers) |
287
+
288
+ ---
289
+
290
+ ## Importance Scoring
291
+
292
+ Every node is scored using a batch SQL computation:
293
+
294
+ ```
295
+ Score = (inDegree × 2) + outDegree + typeWeight
296
+ ```
297
+
298
+ | Type | Weight |
299
+ |---|---|
300
+ | `class` | 3.0 |
301
+ | `component` | 2.5 |
302
+ | `function` | 2.0 |
303
+ | `route` | 1.5 |
304
+ | `interface` | 1.5 |
305
+ | `enum` | 1.5 |
306
+ | `file` | 1.5 |
307
+ | `type_alias` | 1.0 |
308
+ | `variable` | 1.0 |
309
+ | `import` | 0.5 |
310
+
311
+ Importance is computed in a single aggregation query (no N+1), then batch-updated in a transaction.
312
+
313
+ ---
314
+
315
+ ## Semantic Meta Enrichment
316
+
317
+ The parser automatically infers rich metadata for each node:
318
+
319
+ ### UI Role Detection
320
+
321
+ Components are assigned a UI `role` based on name/file pattern matching:
322
+
323
+ | Pattern | Role |
324
+ |---|---|
325
+ | `*panel*` | `panel` |
326
+ | `*tab*` | `tab` |
327
+ | `*page*` | `page` |
328
+ | `*dialog*`, `*modal*` | `dialog` |
329
+ | `*form*` | `form` |
330
+ | `*sidebar*`, `*nav*` | `navigation` |
331
+ | `*header*` | `header` |
332
+ | `*footer*` | `footer` |
333
+ | `*content*` | `content` |
334
+ | `*list*` | `list` |
335
+ | `*card*` | `card` |
336
+ | `*button*` | `action` |
337
+ | `*layout*` | `layout` |
338
+
339
+ ### Feature Inference
340
+
341
+ Features are extracted from directory structure:
342
+ - **Next.js App Router**: `app/(group)/feature-name/...` → `feature: "feature-name"`
343
+ - **Component directories**: `components/feature-name/...` → `feature: "feature-name"`
344
+
345
+ ### Route Detection
346
+
347
+ Next.js App Router routes are auto-detected from `app/**/page.tsx` paths and stored as `route` nodes.
348
+
349
+ ### Tab System Detection
350
+
351
+ When the parser encounters `<TabsContent value="xxx">` (or `TabContent`, `TabPanel`), it:
352
+ 1. Creates `PART_OF_TAB` edges from children to the parent component
353
+ 2. Sets `meta.tab = "xxx"` on the child nodes
354
+
355
+ ### Enriched Meta Search
356
+
357
+ All meta fields (role, tab, feature) are queryable via the `searchNodesExtended` function, which combines name LIKE matching with JSON meta extraction in a single SQL query.
358
+
359
+ ---
360
+
361
+ ## Conversation Memory
362
+
363
+ TokenOS includes a persistent memory system for storing conversation context across sessions:
364
+
365
+ - **Storage**: SQLite `memories` table with title, summary, key_points (JSON array), tags (JSON array), and optional embeddings
366
+ - **Auto-indexing**: Markdown files in `/memory/` or `/memories/` directories within the watched project are automatically parsed and stored
367
+ - **Extraction**: Titles from `# headings`, tags from `tags: [...]` patterns, key points from bullet lists
368
+ - **Search**: Text-based search across title, summary, and tags
369
+ - **Integration**: Memories are automatically surfaced in `search` results
370
+
371
+ ---
372
+
373
+ ## Visualization Dashboard
374
+
375
+ Enable the built-in visualization UI by setting `ui.enabled: true` in your config or passing the `--ui` CLI flag.
376
+
377
+ ### Routes
378
+
379
+ | Route | Description |
380
+ |---|---|
381
+ | `/` | **Dashboard** — Glassmorphism-styled overview with stats cards, top nodes grid, and full context explorer table. Animated with GSAP. |
382
+ | `/graph` | **Network Graph** — Interactive force-directed graph visualization using vis-network. Click nodes for details. |
383
+ | `/api/stats` | JSON API: node counts by type + top 50 nodes |
384
+ | `/api/graph-data` | JSON API: full graph data (auto-limited to top 1500 nodes for browser performance) |
385
+
386
+ ---
387
+
388
+ ## Changing the Embedding Model
389
+
390
+ Edit `tokenos.config.json`:
391
+
392
+ ```json
393
+ {
394
+ "ollama": {
395
+ "model": "mxbai-embed-large:latest"
396
+ }
397
+ }
398
+ ```
399
+
400
+ Then reset and re-index (different models produce incompatible vectors):
401
+
402
+ ```bash
403
+ npm run reset
404
+ npm run dev
405
+ ```
406
+
407
+ Popular models:
408
+ - `mxbai-embed-large:latest` — high quality, larger context
409
+ - `nomic-embed-text` — fast, good general purpose (default fallback)
410
+ - `all-minilm` — lightweight, fast
411
+
412
+ ### Embedding Pipeline Details
413
+
414
+ - **Input enrichment**: Embeddings are generated from a structured prompt that includes `[NAME]`, `[TYPE]`, `[ROLE]`, `[TAB]`, `[FEATURE]`, `[ROUTE]`, `[SUMMARY]`, and `[CODE]` (first 300 chars) tags. This produces richer vectors than embedding raw code alone.
415
+ - **LRU caching**: An in-memory LRU cache (1000 entries, 30min TTL) prevents redundant Ollama calls for unchanged text.
416
+ - **Backfill strategy**: On boot, only nodes _without_ existing embeddings are processed. If Ollama becomes unavailable mid-backfill, the process stops gracefully.
417
+ - **Health checks**: Ollama availability is probed once via `/api/tags` with a 2-second timeout. Subsequent failures reset the health flag for retry on next request.
418
+
419
+ ---
420
+
421
+ ## Architecture
422
+
423
+ ```
424
+ src/
425
+ ├── main.ts # Entry point — validates config, bootstraps all systems, graceful shutdown
426
+ ├── config.ts # Config loader (CLI args → config file → env vars → defaults)
427
+ ├── reset.ts # Reset script — deletes project DB + WAL/SHM files
428
+ ├── verify-parser.ts # Parser verification — validates node types, edge types, and meta
429
+ ├── types.ts # Shared TypeScript types (10 NodeTypes, 10 EdgeTypes, ConversationMemory)
430
+
431
+ ├── db/
432
+ │ ├── connection.ts # SQLite connection + schema (WAL mode, foreign keys, 3 tables, 7 indexes)
433
+ │ ├── queries.ts # 20+ prepared statements (upsert, search, batch importance, meta queries)
434
+ │ ├── memory.ts # Conversation memory CRUD (upsert, search, get all)
435
+ │ └── index.ts # Re-exports
436
+
437
+ ├── indexer/
438
+ │ ├── parser.ts # ts-morph AST parser — extracts 10 node types, 10 edge types, semantic meta
439
+ │ ├── indexer.ts # Orchestrates parse → hash-skip → transaction (delete stale + upsert fresh)
440
+ │ ├── watcher.ts # chokidar file watcher (incremental: add/change/unlink, ignoreInitial)
441
+ │ ├── ignore.ts # .gitignore rule loader + hardcoded ignores (node_modules, .git, dist, etc.)
442
+ │ └── cli.ts # One-shot CLI indexer (npm run index)
443
+
444
+ ├── embeddings/
445
+ │ ├── client.ts # Ollama HTTP client + enriched embedding input builder + LRU cache
446
+ │ ├── similarity.ts # Cosine similarity + ranked search (top-K)
447
+ │ └── index.ts # backfillEmbeddings() + re-exports
448
+
449
+ ├── server/
450
+ │ ├── server.ts # MCP server — registers 6 tools, BFS subgraph builder, node compression
451
+ │ ├── visualize.ts # Optional visualization dashboard (HTTP server, vis-network + GSAP)
452
+ │ └── index.ts # Re-exports
453
+
454
+ └── utils/
455
+ ├── scoring.ts # Importance score computation (delegates to batch SQL)
456
+ ├── logger.ts # Vite-inspired colored logger (picocolors, per-module tags)
457
+ └── cache.ts # Generic LRU cache with TTL (used by embedding client)
458
+ ```
459
+
460
+ ### Boot Sequence
461
+
462
+ ```
463
+ 1. Validate config (watchPath exists)
464
+ 2. Optionally start visualization UI server
465
+ 3. Full directory indexing (recursive walk, .gitignore-aware)
466
+ 4. Probe Ollama → backfill embeddings for un-embedded nodes
467
+ 5. Batch-compute importance scores (single SQL aggregation)
468
+ 6. Start chokidar watcher (incremental updates only)
469
+ 7. Start MCP stdio server (blocks until client disconnects)
470
+ 8. Print Vite-like status banner with health checks
471
+ 9. Register SIGINT/SIGTERM handlers for graceful shutdown
472
+ ```
473
+
474
+ ### Incremental Update Strategy
475
+
476
+ - **Hash-based skip**: Each file's content is SHA-256 hashed (truncated to 16 chars). Files with unchanged hashes are skipped entirely.
477
+ - **Transactional upsert**: On file change, a single SQLite transaction deletes all stale nodes/edges for the file then inserts fresh data, ensuring FK consistency.
478
+ - **Graceful FK handling**: Edges referencing nodes in un-indexed files are silently skipped (they'll be wired up when the target file is indexed).
479
+
480
+ ---
481
+
482
+ ## Configuration Reference
483
+
484
+ ### `tokenos.config.json`
485
+
486
+ | Field | Type | Default | Description |
487
+ |---|---|---|---|
488
+ | `watchPath` | `string` | `process.cwd()` | Absolute path to the project you want to index |
489
+ | `ollama.url` | `string` | `http://localhost:11434` | Ollama server URL |
490
+ | `ollama.model` | `string` | `nomic-embed-text` | Embedding model to use |
491
+ | `ui.enabled` | `boolean` | `false` | Start visualization dashboard on boot |
492
+ | `ui.port` | `number` | `3333` | Dashboard HTTP server port |
493
+
494
+ ### Environment Variable Overrides
495
+
496
+ | Variable | Overrides |
497
+ |---|---|
498
+ | `OLLAMA_URL` | `ollama.url` |
499
+ | `EMBEDDING_MODEL` | `ollama.model` |
500
+ | `GRAPH_UI_PORT` | `ui.port` |
501
+
502
+ ### CLI Arguments
503
+
504
+ | Argument | Description |
505
+ |---|---|
506
+ | First non-flag arg | Overrides `watchPath` |
507
+ | `--ui` | Enables visualization dashboard (overrides config) |
508
+
509
+ **Precedence**: CLI args → config file → environment variables → defaults
510
+
511
+ ---
512
+
513
+ ## Prerequisites
514
+
515
+ | Dependency | Purpose |
516
+ |---|---|
517
+ | Node.js ≥ 18 | Runtime |
518
+ | `npm` | Package manager |
519
+ | [Ollama](https://ollama.ai/) *(optional)* | Semantic embedding generation |
520
+
521
+ If Ollama is not running, the server starts normally — semantic search falls back to text-mode and embeddings are skipped.
522
+
523
+ ---
524
+
525
+ ## Tech Stack
526
+
527
+ **Core Technologies:**
528
+ - **[TypeScript](https://www.typescriptlang.org/) / [Node.js](https://nodejs.org/)** — Core language and runtime (ES2022 target, Node16 module resolution)
529
+ - **[SQLite](https://sqlite.org/)** — Local, fast, embedded graph database (WAL mode)
530
+ - **[ts-morph](https://ts-morph.com/)** — TypeScript AST parsing tool for static analysis
531
+ - **[Model Context Protocol (MCP)](https://modelcontextprotocol.io/)** — Standardized AI tool integration protocol
532
+ - **[Ollama](https://ollama.com/)** — Local semantic vector embeddings *(optional)*
533
+
534
+ ### Dependencies
535
+
536
+ | Package | Version | Role |
537
+ |---|---|---|
538
+ | `@modelcontextprotocol/sdk` | ^1.8.0 | MCP server + stdio transport |
539
+ | `better-sqlite3` | ^11.9.1 | Synchronous SQLite (Node.js) |
540
+ | `ts-morph` | ^25.0.1 | TypeScript AST parsing |
541
+ | `chokidar` | ^4.0.3 | File watching |
542
+ | `ignore` | ^7.0.5 | `.gitignore`-pattern matching |
543
+ | `zod` | ^4.3.6 | Schema validation for MCP tool inputs |
544
+ | `picocolors` | ^1.1.1 | Terminal colors |
545
+
546
+ ### Dev Dependencies
547
+
548
+ | Package | Version | Role |
549
+ |---|---|---|
550
+ | `@types/better-sqlite3` | ^7.6.12 | SQLite type definitions |
551
+ | `@types/node` | ^22.13.13 | Node.js type definitions |
552
+ | `tsx` | ^4.19.3 | TypeScript dev runner |
553
+ | `typescript` | ^5.8.2 | TypeScript compiler |
554
+
555
+ ---
556
+
557
+ ## Limitations
558
+
559
+ - Only `.ts` and `.tsx` files are indexed (no `.js`, `.jsx`, `.vue`, etc.)
560
+ - Semantic search requires Ollama to be running locally
561
+ - The graph database is rebuilt on first run per project
562
+ - Subgraph BFS and cognitive search responses are truncated at 25,000 characters
563
+ - Cross-file edges to un-indexed targets are gracefully skipped (resolved when the dependency is indexed later)
564
+ - Memory file extraction uses basic regex parsing (headings, bullet points, tag patterns)
565
+ - Visualization dashboard loads at most 1,500 nodes to prevent browser rendering issues
566
+
567
+ ---
568
+
569
+ ## License
570
+
571
+ MIT