milens 0.5.0 → 0.5.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -7,7 +7,7 @@
7
7
  <a href="https://www.npmjs.com/package/milens"><img src="https://img.shields.io/npm/v/milens" alt="npm version"></a>
8
8
  <a href="https://github.com/fuze210699/milens/blob/develop/LICENSE"><img src="https://img.shields.io/badge/license-PolyForm--Noncommercial-blue" alt="License: PolyForm Noncommercial"></a>
9
9
  <a href="https://nodejs.org"><img src="https://img.shields.io/badge/node-%3E%3D20-brightgreen" alt="Node.js >= 20"></a>
10
- <img src="https://img.shields.io/badge/languages-11-orange" alt="11 Languages">
10
+ <img src="https://img.shields.io/badge/languages-12-orange" alt="12 Languages">
11
11
  <img src="https://img.shields.io/badge/MCP_tools-19-purple" alt="19 MCP Tools">
12
12
  </p>
13
13
 
@@ -18,26 +18,23 @@
18
18
  <p align="center">
19
19
  <a href="#the-problem">Why?</a> •
20
20
  <a href="#quick-start">Quick Start</a> •
21
- <a href="#what-your-ai-agent-gets">Agent Tools</a> •
21
+ <a href="#cli-commands">CLI Commands</a> •
22
+ <a href="#mcp-tools">MCP Tools</a> •
22
23
  <a href="#editor-setup">Editors</a> •
23
- <a href="#supported-languages">Languages</a>
24
- <a href="#architecture">Architecture</a>
24
+ <a href="#supported-languages">Languages</a>
25
25
  </p>
26
26
 
27
27
  ---
28
28
 
29
29
  ## The Problem
30
30
 
31
- AI agents are blind to structure. They see files as text, not as a connected graph of dependencies.
31
+ You're burning tokens and premium requests on tasks that milens can handle at a fraction of the cost. Every time your agent explores a codebase — reading files one by one, searching for references, tracing call chains — you're paying for what a pre-built knowledge graph delivers instantly.
32
32
 
33
- **A real scenario:**
33
+ **Why not let milens save your wallet?**
34
34
 
35
- 1. You ask your agent to refactor `resolveLinks()` in your codebase
36
- 2. The agent searches for `"resolveLinks"` — finds matches in code, tests, comments, and docs
37
- 3. It renames the function, but misses that `resolveLinksWithStats` wraps it and `analyze()` calls the wrapper — a chain invisible to text search
38
- 4. **Your pipeline breaks. The agent didn't know the call graph.**
35
+ One `milens analyze` replaces dozens of agent tool calls. Instead of the agent spending 10+ steps to map out a function's dependencies, `impact()` returns the full blast radius in a single call.
39
36
 
40
- The root cause: text search can't distinguish a caller from a comment from a type annotation. It has no concept of "what actually depends on this at the code level."
37
+ If you're concerned about security, read our [Security & Privacy](#security--privacy) guarantees milens is fully offline, zero telemetry, localhost-only.
41
38
 
42
39
  ### How milens Solves This
43
40
 
@@ -51,79 +48,259 @@ milens builds a **pre-indexed knowledge graph** at analysis time — resolving e
51
48
 
52
49
  ## Quick Start
53
50
 
54
- **2 commands. That's it.**
55
-
56
51
  ```bash
57
52
  npx milens analyze # index your codebase
58
- npx milens analyze --skills # + generate AI skill files
59
53
  ```
60
54
 
61
- Then add the MCP server to your editor ([setup below](#editor-setup)) and your agent immediately gets 19 tools + 4 resources + 3 prompts — with built-in instructions that teach it how to use them.
62
-
63
- > **No config files needed.** milens sends tool usage guidance via the MCP protocol `initialize` response — every connected agent automatically learns the workflows.
55
+ Then add the MCP server to your editor ([setup below](#editor-setup)) and your agent immediately gets 19 code intelligence tools.
64
56
 
65
57
  ---
66
58
 
67
- ## What Your AI Agent Gets
59
+ ## CLI Commands
68
60
 
69
- ### 19 MCP Tools
61
+ ### `milens analyze` — Index a codebase
70
62
 
71
- | Tool | What It Does |
72
- |---|---|
73
- | **Search & Navigate** | |
74
- | `query` | Symbol search (FTS5 full-text) |
75
- | `grep` | Text search ALL files — templates, SCSS, configs, docs. Scoped: `all`, `code`, `imports`, `definitions` |
76
- | `context` | 360° symbol view incoming refs + outgoing deps |
77
- | `get_file_symbols` | All symbols in a file with ref/dep counts |
78
- | `get_type_hierarchy` | Inheritance/implementation tree |
79
- | **Impact & Safety** | |
80
- | `impact` | Blast radius: what breaks if this changes? Depth-grouped |
81
- | `edit_check` | Pre-edit safety: callers + exports + re-export chains + test coverage + ⚠ warnings |
82
- | `detect_changes` | `git diff` affected symbols + direct dependents |
83
- | `find_dead_code` | Exported symbols with zero references |
84
- | **Understanding** | |
85
- | `smart_context` | Intent-aware context: `understand` / `edit` / `debug` / `test` — returns only what matters |
86
- | `trace` | Execution flow: call chains from entrypoints to a target (or downstream) |
87
- | `routes` | Detect framework routes/endpoints (Express, FastAPI, NestJS, Flask, Go, PHP, Rails) |
88
- | `explain_relationship` | Shortest dependency path between two symbols |
89
- | **Codebase Overview** | |
90
- | `overview` | Combined context + impact + grep in ONE call (saves 2-3 round trips) |
91
- | `domains` | Domain clusters — groups of files forming logical modules |
92
- | `repos` | List all indexed repositories with summary stats |
93
- | `status` | Index stats, domains, test coverage, staleness |
94
-
95
- ### 4 MCP Resources
96
-
97
- | Resource | What It Returns |
98
- |---|---|
99
- | `milens://overview` | Index overview: stats, domains, coverage, staleness |
100
- | `milens://symbol/{name}` | Symbol definition + relationships |
101
- | `milens://file/{path}` | All symbols in a file |
102
- | `milens://domain/{name}` | Domain cluster details |
63
+ Parse all source files, resolve cross-file dependencies, and build a searchable knowledge graph.
64
+
65
+ ```bash
66
+ milens analyze # index current directory
67
+ milens analyze -p /path/to/repo # index a specific repo
68
+ milens analyze -p . --force # full re-index (ignore cache)
69
+ milens analyze -p . --force --verbose # full re-index with detailed progress
70
+ ```
71
+
72
+ **Output:** A `.milens/milens.db` SQLite database containing every symbol, relationship, and search index.
73
+
74
+ **Incremental mode:** By default, only re-parses files whose SHA-256 hash has changed since the last run.
103
75
 
104
- ### 3 Guided Prompts
76
+ #### Generating AI skill files
105
77
 
106
- | Prompt | Workflow |
78
+ Skill files teach your AI agent the codebase structure — key symbols, entry points, cross-area dependencies — without reading every file.
79
+
80
+ ```bash
81
+ milens analyze -p . --skills # generate for all editors
82
+ milens analyze -p . --skills-copilot # GitHub Copilot only
83
+ milens analyze -p . --skills-cursor # Cursor only
84
+ milens analyze -p . --skills-claude # Claude Code only
85
+ milens analyze -p . --skills-agents # AGENTS.md only
86
+ milens analyze -p . --skills-windsurf # Windsurf only
87
+ ```
88
+
89
+ | Flag | Output files |
107
90
  |---|---|
108
- | `delete-feature` | grep impact → context → full deletion plan |
109
- | `refactor-symbol` | context impact → grep → hierarchy → every file to update |
110
- | `explore-symbol` | query context impact (both directions) → grep → summary |
91
+ | `--skills-copilot` | `.github/instructions/*.instructions.md` + `.github/copilot-instructions.md` |
92
+ | `--skills-cursor` | `.cursor/rules/*.mdc` + `.cursor/index.mdc` |
93
+ | `--skills-claude` | `.claude/skills/generated/*/SKILL.md` + `.claude/rules/*.md` + `CLAUDE.md` |
94
+ | `--skills-agents` | `.agents/skills/*/SKILL.md` + `AGENTS.md` |
95
+ | `--skills-windsurf` | `.windsurfrules` |
96
+
97
+ > Root config files use `<!-- milens:start/end -->` markers for **idempotent injection** — re-running replaces the milens section without overwriting your custom content.
98
+
99
+ ---
100
+
101
+ ### `milens search` — Find symbols by name
102
+
103
+ Full-text search (BM25) across all indexed symbol names and file paths.
104
+
105
+ ```bash
106
+ milens search "UserService" # search for symbols
107
+ milens search "auth" --limit 50 # increase result limit (default: 20)
108
+ milens search "handler" -p /path/to/repo # search in a specific repo
109
+ ```
110
+
111
+ **Output format:** `name [kind] file:line (exported)`
112
+
113
+ ```
114
+ AuthService [class] src/auth.ts:15 (exported)
115
+ hashPassword [function] src/auth.ts:3 (exported)
116
+ ```
117
+
118
+ ---
119
+
120
+ ### `milens inspect` — 360° symbol view
121
+
122
+ Show everything about a symbol: who calls it (incoming) and what it depends on (outgoing).
123
+
124
+ ```bash
125
+ milens inspect "AuthService"
126
+ milens inspect "resolveLinks" -p .
127
+ ```
128
+
129
+ **Output:**
130
+
131
+ ```
132
+ AuthService [class] src/auth.ts:15
133
+ incoming:
134
+ calls: handleLogin (src/routes.ts)
135
+ calls: UserController (src/controllers/user.ts)
136
+ outgoing:
137
+ imports: User (src/models.ts)
138
+ calls: hashPassword (src/auth.ts)
139
+ calls: createUser (src/models.ts)
140
+ ```
111
141
 
112
- ### Built-in Agent Instructions
142
+ ---
143
+
144
+ ### `milens impact` — Blast radius analysis
145
+
146
+ Recursively trace what depends on a symbol (upstream) or what a symbol depends on (downstream).
147
+
148
+ ```bash
149
+ milens impact "createUser" # upstream: what breaks if this changes?
150
+ milens impact "UserModel" -d downstream # downstream: what does this depend on?
151
+ milens impact "searchSymbols" --depth 2 # limit traversal depth (default: 3)
152
+ ```
153
+
154
+ **Output:**
155
+
156
+ ```
157
+ TARGET: createUser [function] src/models.ts:42
158
+ [depth 1] AuthService [class] src/auth.ts:15 (calls)
159
+ [depth 1] UserController [class] src/controllers/user.ts:8 (calls)
160
+ [depth 2] handleLogin [function] src/routes.ts:23 (calls)
161
+ ```
113
162
 
114
- The MCP server sends **tool usage guidance** on every `initialize` — agents automatically learn:
163
+ **Depth meaning:**
164
+ - Depth 1 = **WILL BREAK** — direct callers/dependents
165
+ - Depth 2 = **LIKELY AFFECTED** — indirect dependents
166
+ - Depth 3 = **MAY NEED TESTING** — transitive dependents
115
167
 
116
- - When to combine `impact` + `grep` (code deps + text references)
117
- - Pre-edit workflow (`edit_check` or `smart_context intent=edit`)
118
- - `query` for code identifiers vs `grep` for display text
119
- - Impact depth meaning: 1 = WILL BREAK, 2 = LIKELY AFFECTED, 3 = MAY NEED TESTING
120
- - unresolved markers vs external (expected) classification
168
+ ---
169
+
170
+ ### `milens serve` Start MCP server
171
+
172
+ Expose the knowledge graph to AI agents via the Model Context Protocol.
173
+
174
+ ```bash
175
+ milens serve # stdio transport (for editors)
176
+ milens serve -p /path/to/repo # serve a specific repo
177
+ milens serve --http # HTTP transport on port 3100
178
+ milens serve --http --port 8080 # HTTP on custom port
179
+ ```
180
+
181
+ **stdio mode** (default): Used by editors like VS Code, Cursor, and Claude Code. The agent communicates through stdin/stdout.
182
+
183
+ **HTTP mode**: Used by remote agents or custom integrations. Binds to `127.0.0.1` only — no network exposure.
184
+
185
+ Endpoint: `POST http://localhost:3100/mcp`
186
+
187
+ ---
188
+
189
+ ### `milens status` — Show index stats
190
+
191
+ ```bash
192
+ milens status # current directory
193
+ milens status -p /path/to/repo # specific repo
194
+ ```
195
+
196
+ **Output:**
197
+
198
+ ```
199
+ Repository: /home/user/my-project
200
+ Database: /home/user/my-project/.milens/milens.db
201
+ Indexed: 2026-04-11T10:30:00Z
202
+ Symbols: 210
203
+ Links: 348
204
+ Files: 30
205
+ ```
206
+
207
+ ---
208
+
209
+ ### `milens list` — List all indexed repositories
210
+
211
+ ```bash
212
+ milens list
213
+ ```
214
+
215
+ **Output:**
216
+
217
+ ```
218
+ 3 indexed repositories:
219
+
220
+ /home/user/project-a
221
+ DB: /home/user/project-a/.milens/milens.db
222
+ Indexed: 2026-04-11T10:30:00Z
223
+
224
+ /home/user/project-b
225
+ DB: /home/user/project-b/.milens/milens.db
226
+ Indexed: 2026-04-10T15:22:00Z
227
+ ```
228
+
229
+ ---
230
+
231
+ ### `milens clean` — Remove index
232
+
233
+ ```bash
234
+ milens clean # remove index for current directory
235
+ milens clean -p /path/to/repo # remove index for specific repo
236
+ milens clean --all # remove ALL indexes
237
+ ```
238
+
239
+ ---
240
+
241
+ ### `milens dashboard` — Usage analytics
242
+
243
+ Open a browser-based dashboard showing tool usage statistics, token savings, and response times.
244
+
245
+ ```bash
246
+ milens dashboard # open on port 3200
247
+ milens dashboard --port 8080 # custom port
248
+ ```
249
+
250
+ ---
251
+
252
+ ## MCP Tools
253
+
254
+ When the MCP server is running, your AI agent gets these 19 tools:
255
+
256
+ ### Search & Navigate
257
+
258
+ | Tool | What It Does | Example |
259
+ |---|---|---|
260
+ | `query` | Symbol search (FTS5 full-text) | `query({query: "UserService"})` |
261
+ | `grep` | Text search ALL files — code, templates, configs, docs | `grep({pattern: "TODO", scope: "all"})` |
262
+ | `context` | 360° symbol view — incoming refs + outgoing deps | `context({name: "AuthService"})` |
263
+ | `get_file_symbols` | All symbols in a file with ref/dep counts | `get_file_symbols({file: "src/auth.ts"})` |
264
+ | `get_type_hierarchy` | Inheritance/implementation tree | `get_type_hierarchy({name: "BaseController"})` |
265
+
266
+ ### Impact & Safety
267
+
268
+ | Tool | What It Does | Example |
269
+ |---|---|---|
270
+ | `impact` | Blast radius: what breaks if this changes? | `impact({target: "createUser"})` |
271
+ | `edit_check` | Pre-edit safety: callers + exports + test coverage + ⚠ warnings | `edit_check({name: "resolveLinks"})` |
272
+ | `detect_changes` | Git diff → affected symbols + direct dependents | `detect_changes({})` |
273
+ | `find_dead_code` | Exported symbols with zero references | `find_dead_code({})` |
274
+
275
+ ### Understanding
276
+
277
+ | Tool | What It Does | Example |
278
+ |---|---|---|
279
+ | `smart_context` | Intent-aware context: `understand`/`edit`/`debug`/`test` | `smart_context({name: "analyze", intent: "edit"})` |
280
+ | `trace` | Execution flow: call chains to/from entrypoints | `trace({to: "searchSymbols"})` |
281
+ | `routes` | Detect framework routes/endpoints | `routes({})` |
282
+ | `explain_relationship` | Shortest path between two symbols | `explain_relationship({from: "A", to: "B"})` |
283
+ | `overview` | Combined context + impact + grep in ONE call | `overview({name: "Database"})` |
284
+
285
+ ### Codebase Overview
286
+
287
+ | Tool | What It Does | Example |
288
+ |---|---|---|
289
+ | `domains` | Domain clusters — groups of files forming logical modules | `domains({})` |
290
+ | `repos` | List all indexed repositories | `repos({})` |
291
+ | `status` | Index stats, domains, test coverage, staleness | `status({})` |
292
+
293
+ ### Resources & Prompts
294
+
295
+ **4 Resources:** `milens://overview`, `milens://symbol/{name}`, `milens://file/{path}`, `milens://domain/{name}`
296
+
297
+ **3 Guided Prompts:** `delete-feature`, `refactor-symbol`, `explore-symbol` — each triggers a multi-step workflow using the tools above.
121
298
 
122
299
  ---
123
300
 
124
301
  ## Editor Setup
125
302
 
126
- ### VS Code / GitHub Copilot (recommended)
303
+ ### VS Code / GitHub Copilot
127
304
 
128
305
  ```bash
129
306
  npx milens analyze -p . # index your repo (run once)
@@ -143,8 +320,6 @@ Add to `.vscode/mcp.json`:
143
320
  }
144
321
  ```
145
322
 
146
- **Done.** Copilot now has access to 19 code intelligence tools.
147
-
148
323
  <details>
149
324
  <summary><strong>Other Editors</strong></summary>
150
325
 
@@ -194,75 +369,10 @@ command = "npx"
194
369
  args = ["-y", "milens", "serve", "-p", "."]
195
370
  ```
196
371
 
197
- #### HTTP Mode (remote agents)
198
-
199
- ```bash
200
- npx milens serve --http --port 3100 # localhost only, no auth needed
201
- ```
202
-
203
- Endpoint: `POST http://localhost:3100/mcp`
204
-
205
372
  </details>
206
373
 
207
374
  ---
208
375
 
209
- ## Skills Generation
210
-
211
- Generate editor-specific context files from your knowledge graph:
212
-
213
- ```bash
214
- npx milens analyze -p . --skills # all editors at once
215
- npx milens analyze -p . --skills-copilot # GitHub Copilot only
216
- npx milens analyze -p . --skills-cursor # Cursor only
217
- npx milens analyze -p . --skills-claude # Claude Code only
218
- npx milens analyze -p . --skills-agents # AGENTS.md only
219
- npx milens analyze -p . --skills-windsurf # Windsurf only
220
- ```
221
-
222
- This generates per-area skill files with: key symbols, entry points, cross-area dependencies, and **MCP tool usage instructions** — so agents know both the codebase structure and how to use milens tools.
223
-
224
- | Output Path | Editor |
225
- |---|---|
226
- | `.github/instructions/*.instructions.md` + `.github/copilot-instructions.md` | GitHub Copilot |
227
- | `.cursor/rules/*.mdc` + `.cursor/index.mdc` | Cursor |
228
- | `.claude/skills/generated/*/SKILL.md` + `.claude/rules/*.md` + `CLAUDE.md` | Claude Code |
229
- | `.agents/skills/*/SKILL.md` + `AGENTS.md` | 40+ agents |
230
- | `.windsurfrules` | Windsurf |
231
-
232
- > Root config files use `<!-- milens:start/end -->` markers for **idempotent injection** — re-running replaces the milens section without overwriting your custom content.
233
-
234
- ---
235
-
236
- ## CLI Commands
237
-
238
- ```bash
239
- # ── Index & Explore ──
240
- npx milens analyze -p . # index current directory
241
- npx milens analyze -p . --force --verbose # full re-index with progress
242
- npx milens search "UserService" # search symbols (FTS5)
243
- npx milens inspect "AuthService" # 360° view: refs + deps
244
-
245
- # ── Impact Analysis ──
246
- npx milens impact "createUser" # what breaks if this changes?
247
- npx milens impact "UserModel" -d downstream # what does this depend on?
248
-
249
- # ── MCP Server ──
250
- npx milens serve -p . # stdio (for editors)
251
- npx milens serve --http --port 3100 # HTTP (for remote agents)
252
-
253
- # ── Management ──
254
- npx milens status -p . # index stats
255
- npx milens list # all indexed repos
256
- npx milens clean -p . # remove index
257
- npx milens clean --all # remove all indexes
258
-
259
- # ── Dashboard ──
260
- npx milens dashboard # usage analytics on port 3200
261
- npx milens dashboard --port 8080 # custom port
262
- ```
263
-
264
- ---
265
-
266
376
  ## Supported Languages
267
377
 
268
378
  | Language | Extensions | Imports | Calls | Heritage | Frameworks |
@@ -278,6 +388,7 @@ npx milens dashboard --port 8080 # custom port
278
388
  | Vue | `.vue` | ✓ | ✓ template refs | ✓ | Vue 3 SFC |
279
389
  | HTML | `.html` `.htm` | ✓ `<script src>` `<link>` | ✓ inline `<script>` | — | — |
280
390
  | CSS | `.css` | ✓ `@import` | — | — | Custom properties |
391
+ | Markdown | `.md` `.mdx` | ✓ local `[links]()` | — | — | Headings → section symbols, parent-child hierarchy |
281
392
 
282
393
  ---
283
394
 
@@ -287,117 +398,46 @@ npx milens dashboard --port 8080 # custom port
287
398
  <img src="docs/diagram2.svg" alt="milens architecture: Indexing Pipeline → MCP Server → AI Agent" width="700">
288
399
  </p>
289
400
 
290
- ### Multi-Repo Architecture
401
+ ### How it works
402
+
403
+ 1. **Scan** — find all source files matching supported extensions
404
+ 2. **Parse** — tree-sitter extracts symbols (functions, classes, methods, etc.) and raw references (imports, calls, extends)
405
+ 3. **Resolve** — cross-file link resolution: match import paths to files, match call names to symbol definitions
406
+ 4. **Enrich** — compute roles (entrypoint/hub/utility/leaf), heat scores, domain clusters
407
+ 5. **Persist** — store everything in SQLite with FTS5 search and recursive CTEs for graph traversal
291
408
 
292
- milens uses a **global registry** — one MCP server serves all indexed repos. No per-project server config needed.
409
+ ### Multi-Repo
410
+
411
+ milens uses a global registry (`~/.milens/`) — one MCP server serves all indexed repos. Pass `repo` to target a specific one when multiple are registered.
293
412
 
294
413
  <p align="center">
295
- <img src="docs/diagram3.svg" alt="Multi-repo architecture: CLI → Registry → Per-Repo DBs → MCP Server" width="500">
414
+ <img src="docs/diagram3.svg" alt="Multi-repo architecture" width="500">
296
415
  </p>
297
416
 
298
- > With a single indexed repo, all tools work without specifying `repo`. When multiple repos are registered, pass `repo` to target a specific one.
299
-
300
417
  ### Design Decisions
301
418
 
302
419
  | Decision | Rationale |
303
420
  |---|---|
304
- | **Declarative LangSpec** | Each language = 1 config object with tree-sitter queries. One universal extractor for all 11 languages |
421
+ | **Declarative LangSpec** | Each language = 1 config object with tree-sitter queries. One universal extractor for all 12 languages |
305
422
  | **SQLite + recursive CTE** | Impact analysis runs entirely in the database — no full graph in memory |
306
423
  | **Token-compact output** | `name [kind] file:line` format — saves 40-60% tokens for AI |
307
424
  | **Incremental by hash** | SHA-256 file hashing — only changed files get re-parsed |
308
- | **Union-Find domains** | Graph-based clustering (files with ≥2 mutual links = same domain) — smarter than directory-based |
309
- | **External-aware resolution** | Separates internal unresolved (⚠ data quality) from external packages (✓ expected) |
310
- | **Lazy DB pools** | Connections opened on demand, evicted after 5min idle |
311
425
  | **Localhost-only HTTP** | Binds `127.0.0.1` — no network exposure without explicit intent |
312
426
 
313
427
  ---
314
428
 
315
429
  ## Security & Privacy
316
430
 
317
- milens is **offline by design** — zero network calls, zero telemetry. Everything executes on your machine.
431
+ milens is **offline by design** — zero network calls, zero telemetry.
318
432
 
319
433
  | Layer | Protection |
320
434
  |---|---|
321
- | **Data locality** | Index lives in `.milens/` per repo (gitignored). Global registry (`~/.milens/`) stores only file paths — no source code |
322
- | **HTTP transport** | Binds to `127.0.0.1` only — requires explicit `--http` flag, never auto-exposed |
323
- | **User-supplied regex** | Validated against ReDoS patterns before execution |
324
- | **FTS5 queries** | Each search token quoted as a literal — no query injection |
435
+ | **Data locality** | Index lives in `.milens/` per repo (gitignored). No source code stored in registry |
436
+ | **HTTP transport** | Binds to `127.0.0.1` only — requires explicit `--http` flag |
437
+ | **User-supplied regex** | Validated against ReDoS patterns |
438
+ | **FTS5 queries** | Each token quoted as a literal — no query injection |
325
439
  | **File access** | All reads bounded to the repo root — no path traversal |
326
- | **Git integration** | Uses `execFileSync` with argument arrays — no shell interpolation |
327
-
328
- ---
329
-
330
- ## Tool Examples
331
-
332
- These examples are from **milens indexing itself** (`npx milens analyze -p .`):
333
-
334
- ```
335
- # Pre-edit safety check — real output from milens self-index
336
- edit_check({name: "createMcpServer"})
337
- → createMcpServer [function] src/server/mcp.ts:272 {utility,heat:70} (exported)
338
- callers (2):
339
- calls: startStdio [function] src/server/mcp.ts:1475
340
- calls: startHttp [function] src/server/mcp.ts:1483
341
- deps (32): searchSymbols, findSymbolByName, getIncomingLinks, findUpstream,
342
- grepFiles, traceToEntrypoints, getDomainStats, getStaleFiles, ...
343
-
344
- # Context — 360° view with callers and callees
345
- context({name: "analyze"})
346
- → analyze [function] src/analyzer/engine.ts:23 {utility,heat:55} (exported)
347
- incoming:
348
- calls: src/cli.ts (CLI entry point)
349
- outgoing (26 deps):
350
- calls: scanFiles [function] src/analyzer/scanner.ts:11
351
- calls: resolveLinksWithStats [function] src/analyzer/resolver.ts:27
352
- calls: enrichMetadata [function] src/analyzer/enrich.ts:21
353
- calls: loadLanguage [function] src/parser/loader.ts:20
354
- calls: transaction, insertSymbol, insertLink, rebuildSearch ... (db ops)
355
-
356
- # Impact analysis — what breaks if searchSymbols changes?
357
- impact({target: "searchSymbols", direction: "upstream"})
358
- → depth 1:
359
- createMcpServer [function] src/server/mcp.ts:272 (calls)
360
- depth 2:
361
- startStdio [function] src/server/mcp.ts:1475 (calls)
362
- startHttp [function] src/server/mcp.ts:1483 (calls)
363
-
364
- # File symbols — what's inside a file?
365
- get_file_symbols({file: "src/store/db.ts"})
366
- → src/store/db.ts: 45 symbols
367
- Database [class] L10-461 (exported) ← 0 refs, → 0 deps
368
- searchSymbols [method] L150-165 ← 3 refs, → 0 deps
369
- findUpstream [method] L192-195 ← 3 refs, → 1 deps
370
- traceToEntrypoints [method] L346-388 ← 2 refs, → 2 deps
371
- getDomainStats [method] L406-417 ← 3 refs, → 0 deps
372
- ... (40 more)
373
- ```
374
-
375
- ---
376
-
377
- ## Adding a Language
378
-
379
- Create `src/parser/lang-xxx.ts`:
380
-
381
- ```typescript
382
- import type { LangSpec } from './extract.js';
383
-
384
- const spec: LangSpec = {
385
- id: 'xxx',
386
- extensions: ['.xxx'],
387
- wasmName: 'tree-sitter-xxx',
388
- queries: {
389
- functions: `(function_definition name: (identifier) @name) @def`,
390
- classes: `(class_definition name: (identifier) @name) @def`,
391
- },
392
- resolveImport(raw, fromFile, root, aliases) {
393
- // return resolved file path or null
394
- },
395
- };
396
-
397
- export default spec;
398
- ```
399
-
400
- Then register it in `src/parser/languages.ts`.
440
+ | **Git integration** | `execFileSync` with argument arrays — no shell interpolation |
401
441
 
402
442
  ---
403
443
 
@@ -406,11 +446,10 @@ Then register it in `src/parser/languages.ts`.
406
446
  ```bash
407
447
  npm install # install dependencies
408
448
  npm run build # tsc → dist/
409
- npm test # vitest (43 tests)
449
+ npm test # vitest (55 tests)
410
450
  npm run lint # tsc --noEmit
411
451
  npm run self-analyze # index this repo
412
452
  npm run self-serve # start MCP server on port 3100
413
- npx milens dashboard # open usage analytics dashboard
414
453
  ```
415
454
 
416
455
  ---
@@ -1 +1 @@
1
- {"version":3,"file":"engine.d.ts","sourceRoot":"","sources":["../../src/analyzer/engine.ts"],"names":[],"mappings":"AAWA,OAAO,KAAK,EAA8E,aAAa,EAAE,MAAM,aAAa,CAAC;AAI7H,UAAU,aAAa;IACrB,QAAQ,EAAE,MAAM,CAAC;IACjB,MAAM,EAAE,MAAM,CAAC;IACf,OAAO,CAAC,EAAE,OAAO,CAAC;IAClB,KAAK,CAAC,EAAE,OAAO,CAAC;IAChB,OAAO,CAAC,EAAE,MAAM,CAAC,MAAM,EAAE,MAAM,CAAC,CAAC;CAClC;AAED,wBAAsB,OAAO,CAAC,IAAI,EAAE,aAAa,GAAG,OAAO,CAAC,aAAa,CAAC,CAmMzE"}
1
+ {"version":3,"file":"engine.d.ts","sourceRoot":"","sources":["../../src/analyzer/engine.ts"],"names":[],"mappings":"AAYA,OAAO,KAAK,EAA8E,aAAa,EAAE,MAAM,aAAa,CAAC;AAI7H,UAAU,aAAa;IACrB,QAAQ,EAAE,MAAM,CAAC;IACjB,MAAM,EAAE,MAAM,CAAC;IACf,OAAO,CAAC,EAAE,OAAO,CAAC;IAClB,KAAK,CAAC,EAAE,OAAO,CAAC;IAChB,OAAO,CAAC,EAAE,MAAM,CAAC,MAAM,EAAE,MAAM,CAAC,CAAC;CAClC;AAED,wBAAsB,OAAO,CAAC,IAAI,EAAE,aAAa,GAAG,OAAO,CAAC,aAAa,CAAC,CA0OzE"}
@@ -6,6 +6,7 @@ import { getParser, loadLanguage } from '../parser/loader.js';
6
6
  import { extractFromTree, clearQueryCache } from '../parser/extract.js';
7
7
  import { extractVueScript, extractVueTemplateRefs } from '../parser/lang-vue.js';
8
8
  import { extractHtmlScripts, extractHtmlRefs } from '../parser/lang-html.js';
9
+ import { extractMarkdown } from '../parser/lang-md.js';
9
10
  import { resolveLinksWithStats } from './resolver.js';
10
11
  import { enrichMetadata } from './enrich.js';
11
12
  import { Database } from '../store/db.js';
@@ -30,6 +31,10 @@ export async function analyze(opts) {
30
31
  group.push({ ...file, spec });
31
32
  langGroups.set(spec.wasmName, group);
32
33
  }
34
+ // Phase 2.5: Separate document files (regex-based, no tree-sitter)
35
+ const docGroup = langGroups.get('');
36
+ if (docGroup)
37
+ langGroups.delete('');
33
38
  // Phase 3: Parse & extract — process each language group together
34
39
  // This keeps the same parser/language/compiled queries hot in cache
35
40
  const symbolsByFile = new Map();
@@ -41,6 +46,40 @@ export async function analyze(opts) {
41
46
  const resolvedImportPaths = new Map();
42
47
  const parsedFiles = new Set();
43
48
  let filesParsed = 0;
49
+ // Process document files (no tree-sitter needed)
50
+ if (docGroup) {
51
+ for (const file of docGroup) {
52
+ const source = readFileSync(file.absolutePath, 'utf-8');
53
+ if (!opts.force && db.isFileUpToDate(file.relativePath, source)) {
54
+ if (opts.verbose)
55
+ console.log(`[skip] ${file.relativePath} (unchanged)`);
56
+ continue;
57
+ }
58
+ try {
59
+ const result = parseDocFile(source, file.relativePath, file.spec);
60
+ if (!result)
61
+ continue;
62
+ symbolsByFile.set(file.relativePath, result.symbols);
63
+ allSymbols.push(...result.symbols);
64
+ allImports.push(...result.imports);
65
+ for (const imp of result.imports) {
66
+ const resolved = file.spec.resolveImport(imp.modulePath, imp.filePath, rootPath, aliases);
67
+ if (resolved) {
68
+ resolvedImportPaths.set(`${imp.filePath}::${imp.modulePath}`, resolved);
69
+ }
70
+ }
71
+ db.upsertFileHash(file.relativePath, source);
72
+ parsedFiles.add(file.relativePath);
73
+ filesParsed++;
74
+ if (opts.verbose)
75
+ console.log(`[parse] ${file.relativePath}: ${result.symbols.length} symbols`);
76
+ }
77
+ catch (err) {
78
+ if (opts.verbose)
79
+ console.error(`[error] ${file.relativePath}: ${err}`);
80
+ }
81
+ }
82
+ }
44
83
  for (const [wasmName, group] of langGroups) {
45
84
  // Pre-load parser + language once per group
46
85
  const parser = await getParser(wasmName);
@@ -302,6 +341,12 @@ async function parseFile(source, filePath, spec, parser, lang) {
302
341
  }
303
342
  return result;
304
343
  }
344
+ function parseDocFile(source, filePath, spec) {
345
+ if (spec.id === 'markdown') {
346
+ return extractMarkdown(source, filePath);
347
+ }
348
+ return null;
349
+ }
305
350
  /** Check if a file path looks like a test/spec file */
306
351
  function isTestFile(filePath) {
307
352
  return /\.(test|spec)\.[jt]sx?$/.test(filePath) ||