xindex 1.0.0 → 1.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
File without changes
File without changes
package/README.md CHANGED
@@ -1,37 +1,16 @@
1
1
  # xindex
2
2
 
3
- Local semantic code search. Index your codebase, search by meaning — no cloud, no API keys. Also runs as an MCP server so Claude Code (and other MCP clients) can search your repo directly.
3
+ **grep matches text. xindex matches meaning.**
4
4
 
5
- ## Features
6
-
7
- - **Local** — everything runs on your machine; embeddings cached on disk
8
- - **Semantic search** — natural-language queries, not just substring match
9
- - **MCP server** — plug into Claude Code via `.mcp.json`
10
- - **Watch mode** — keep the index warm while you code
11
- - **Gitignore-aware** — respects `.gitignore` + custom ignore rules
12
- - **Zero config** — works with defaults; `.xindex.json` is optional
13
-
14
- ## How it fits together
15
-
16
- ```
17
- your repo xindex
18
- ───────── ──────
19
- *.ts / *.md ──► walk ──► keywords ──► embed ──► .xindex/
20
- .gitignore (vectra index)
21
-
22
- CLI / MCP ◄── search ◄── embed query ◄── "question" ┘
23
- ```
5
+ Local semantic code search for your codebase — plus an MCP server so Claude Code (and any MCP client) can search your repo directly. Fully local, no cloud, no API keys.
24
6
 
25
7
  ## Install
26
8
 
27
9
  ```bash
28
- git clone <repo-url> xindex
29
- cd xindex
30
- yarn install # or npm install
31
- npm link # makes xindex-* binaries + xindex-mcp available on PATH
10
+ npm i -g xindex
32
11
  ```
33
12
 
34
- Requires Node.js. First run downloads the embedding model (`all-MiniLM-L6-v2`, ~25MB) after that, fully offline.
13
+ First run downloads a small embedding model (`all-MiniLM-L6-v2`, ~23MB). After that, fully offline.
35
14
 
36
15
  ## Quick start
37
16
 
@@ -43,7 +22,35 @@ xindex-search "where is auth handled" # ask a question
43
22
 
44
23
  Index lives in `./.xindex/` — add it to `.gitignore`.
45
24
 
46
- ## CLI
25
+ ## Use with Claude Code (MCP)
26
+
27
+ Drop this into `.mcp.json` at your project root:
28
+
29
+ ```json
30
+ {
31
+ "mcpServers": {
32
+ "xindex": {
33
+ "command": "xindex-mcp",
34
+ "args": []
35
+ }
36
+ }
37
+ }
38
+ ```
39
+
40
+ Open the project in Claude Code — it picks up the xindex MCP server and can call `xindex_search`, `xindex_index`, and `xindex_reset` directly. Fewer hallucinations, fewer round-trips.
41
+
42
+ ## Features
43
+
44
+ - **Local** — everything runs on your machine; embeddings cached on disk
45
+ - **Semantic search** — natural-language queries, not substring match
46
+ - **MCP server** — plugs into Claude Code via `.mcp.json`
47
+ - **Watch mode** — keeps the index warm while you code
48
+ - **Gitignore-aware** — respects `.gitignore` + custom ignore rules
49
+ - **Zero config** — works with defaults; `.xindex.json` is optional
50
+
51
+ ---
52
+
53
+ ## CLI reference
47
54
 
48
55
  All five binaries run from any directory; they index/search the current working directory.
49
56
 
@@ -81,24 +88,7 @@ xindex-mcp --watch-disabled # no watch
81
88
  xindex-mcp --watch-dir=./src # watch a specific dir
82
89
  ```
83
90
 
84
- ## MCP (Claude Code & others)
85
-
86
- Drop this into `.mcp.json` at your project root:
87
-
88
- ```json
89
- {
90
- "mcpServers": {
91
- "xindex": {
92
- "command": "xindex-mcp",
93
- "args": []
94
- }
95
- }
96
- }
97
- ```
98
-
99
- Requires `xindex-mcp` on PATH (`npm link` inside this repo does it). If you'd rather pin to an absolute path, use `/absolute/path/to/bin/xindex-mcp`.
100
-
101
- ### Tools exposed
91
+ ## MCP tools
102
92
 
103
93
  | Tool | What it does | Input |
104
94
  |------|--------------|-------|
@@ -108,14 +98,6 @@ Requires `xindex-mcp` on PATH (`npm link` inside this repo does it). If you'd ra
108
98
 
109
99
  Note: CLI `xindex-search` defaults to 10 results; MCP `xindex_search` defaults to 5.
110
100
 
111
- ### Typical Claude Code flow
112
-
113
- 1. Commit `.mcp.json` to your repo.
114
- 2. Open the project in Claude Code — it picks up the xindex MCP server.
115
- 3. Ask it to call `xindex_index` once with `inputs: ["."]`.
116
- 4. From then on, it uses `xindex_search` for natural-language lookups.
117
- 5. Watch mode keeps the index fresh as you edit.
118
-
119
101
  ## Configuration
120
102
 
121
103
  ### `.xindex.json` (optional)
@@ -140,57 +122,29 @@ Created automatically. Contains:
140
122
 
141
123
  **Always gitignore it.**
142
124
 
143
- ### `.gitignore`
125
+ ### `.gitignore` minimum
144
126
 
145
- Minimum:
146
127
  ```
147
128
  .xindex
148
129
  node_modules/
149
130
  ```
150
131
 
151
- ## Examples
152
-
153
- ### Index + search from the terminal
154
-
155
- ```bash
156
- cd my-app
157
- xindex-index .
158
- xindex-search "rate limiter implementation"
159
- ```
160
-
161
- ### Keep the index warm while coding
162
-
163
- ```bash
164
- xindex-watch .
165
- # edit files in another terminal; index updates incrementally
166
- # Ctrl+C to stop
167
- ```
168
-
169
- ### Use from Claude Code via MCP
170
-
171
- ```bash
172
- # one-time setup
173
- cd xindex && npm link
174
-
175
- # in your project
176
- echo '{"mcpServers":{"xindex":{"command":"xindex-mcp","args":[]}}}' > .mcp.json
132
+ ## How it fits together
177
133
 
178
- # open project in Claude Code — xindex tools are available
179
134
  ```
180
-
181
- ### Run MCP without watching
182
-
183
- ```bash
184
- xindex-mcp --watch-disabled
135
+ your repo xindex
136
+ ───────── ──────
137
+ *.ts / *.md ──► walk ──► keywords ──► embed ──► .xindex/
138
+ .gitignore (vectra index)
139
+
140
+ CLI / MCP ◄── search ◄── embed query ◄── "question" ┘
185
141
  ```
186
142
 
187
- You control when reindexing happens via explicit `xindex_index` calls.
188
-
189
143
  ## Project layout
190
144
 
191
145
  ```
192
146
  apps/ entry points (run.*.ts) + app composers (IndexApp, SearchApp, McpApp, ...)
193
- bin/ shebang wrappers invoked by npm/yarn and .mcp.json
147
+ bin/ shebang wrappers invoked by npm and .mcp.json
194
148
  componets/ shared building blocks: config, walk, watch, embed, vectra adapter, logger
195
149
  features/ domain operations: indexContent, searchIndex, removeContent, resetIndex
196
150
  packages/ small internal libs (streamx, fun)
@@ -201,6 +155,17 @@ packages/ small internal libs (streamx, fun)
201
155
 
202
156
  See [CLAUDE.md](CLAUDE.md) for contributor conventions (HOF pattern, logger rules, task workflow).
203
157
 
158
+ ## Development
159
+
160
+ Working on xindex itself? Clone and link:
161
+
162
+ ```bash
163
+ git clone <repo-url> xindex
164
+ cd xindex
165
+ yarn install # or npm install
166
+ npm link # exposes xindex-* binaries from your working copy
167
+ ```
168
+
204
169
  ## License
205
170
 
206
171
  MIT
@@ -0,0 +1,139 @@
1
+ # xindex — local semantic code search, with an MCP server built in
2
+
3
+ > Index your repo, search by meaning, no cloud. Works with Claude Code out of the box.
4
+
5
+ ---
6
+
7
+ ## The problem
8
+
9
+ You land in a 50,000-line codebase someone else wrote. You need "the part that handles auth retries."
10
+
11
+ ```
12
+ grep -r "retry" .
13
+ ```
14
+
15
+ 200 matches. Most unrelated. You try `auth`. 400 matches.
16
+
17
+ You know what you want. You just can't spell it in one string.
18
+
19
+ ## Why I built this
20
+
21
+ I wanted grep's ergonomics with a real search engine's semantics — but running entirely on my laptop. Cloud code search solves the meaning problem, but I'm not uploading half-finished private repos to anyone's servers. IDE search works inside files you've already found. Neither answers *"where in this entire repo is X done?"* without the round-trip to the cloud.
22
+
23
+ So I built a small tool that indexes a repo once and lets me ask it questions. Then the MCP ecosystem happened, and the same index became the single most useful thing I could hand to Claude Code.
24
+
25
+ ## What xindex is
26
+
27
+ A small CLI that indexes your codebase and lets you search it by natural-language meaning. It also runs as an MCP server, so Claude Code (or any MCP client) can call it directly.
28
+
29
+ - **Local** — nothing leaves your machine
30
+ - **Semantic** — natural-language queries, not substring matches
31
+ - **MCP built in** — four lines of JSON to wire into Claude Code
32
+ - **Watch mode** — keeps the index fresh as you edit
33
+ - **Gitignore-aware** — plus your own rules
34
+
35
+ What it's *not*: not a grep replacement (exact strings — grep wins), not code intelligence (no symbols/refs — your IDE wins). It's a focused semantic index. Nothing more.
36
+
37
+ ## 30 seconds, end to end
38
+
39
+ ```bash
40
+ npm i -g xindex
41
+
42
+ cd my-project
43
+ xindex-index .
44
+ xindex-search "rate limiter logic"
45
+ ```
46
+
47
+ First run downloads a ~23MB embedding model (one time). Then you get ranked file paths back — enough to jump straight to the right place.
48
+
49
+ ## The part I'm actually excited about
50
+
51
+ Semantic search is one of the highest-leverage tools you can give an AI assistant in an unfamiliar repo.
52
+
53
+ Drop this into `.mcp.json`:
54
+
55
+ ```json
56
+ {
57
+ "mcpServers": {
58
+ "xindex": {
59
+ "command": "xindex-mcp",
60
+ "args": []
61
+ }
62
+ }
63
+ }
64
+ ```
65
+
66
+ Open the project in Claude Code. Ask about your codebase. Watch Claude call `xindex_search` and come back with real file references instead of invented ones.
67
+
68
+ The hallucinations drop. The round-trips drop. That alone was worth shipping.
69
+
70
+ ## What I'm not pretending
71
+
72
+ v1 of a tool I built for myself:
73
+
74
+ - First run needs network (model download)
75
+ - One repo at a time; no cross-repo search
76
+ - No AST awareness — works on keywords, not structure
77
+ - Quality depends on descriptive names — `x1`, `foo`, `tmp` won't index well
78
+
79
+ If it breaks on your repo, that's the feedback I want most.
80
+
81
+ ## Try it
82
+
83
+ ```bash
84
+ npm i -g xindex
85
+ ```
86
+
87
+ - **npm**: [npmjs.com/package/xindex](https://www.npmjs.com/package/xindex)
88
+
89
+ What I'd love to hear: does the quality hold up on your repo? What does it fumble? What would make it genuinely useful day-to-day?
90
+
91
+ DM me directly:
92
+
93
+ - **X / Twitter**: [@slavahatnuke](https://x.com/slavahatnuke)
94
+ - **LinkedIn**: [slava-xatnuk](https://www.linkedin.com/in/slava-xatnuk/)
95
+
96
+ ---
97
+
98
+ <!-- APPENDIX — not part of the post, for screenshots only -->
99
+
100
+ ## Screenshot queries (for the post)
101
+
102
+ ### Terminal demo (`xindex-search` output)
103
+
104
+ Good candidates — run inside the xindex repo itself to get clean, relatable results:
105
+
106
+ 1. `xindex-search "where is the MCP server registered"`
107
+ 2. `xindex-search "file watcher debounce"`
108
+ 3. `xindex-search "how keywords are extracted"`
109
+ 4. `xindex-search "gitignore handling"`
110
+ 5. `xindex-search "how is the vector index stored"`
111
+
112
+ Pick the one with the cleanest output. #1 tends to match well because "MCP" is distinctive.
113
+
114
+ ### Claude Code + xindex (mid-post screenshot)
115
+
116
+ Open Claude Code in a project with xindex wired into `.mcp.json`. Ask it something where you can visibly see it invoke `xindex_search`:
117
+
118
+ 1. *"Where does xindex handle the file watcher lock?"*
119
+ 2. *"Show me how the MCP server wires up its tools."*
120
+ 3. *"How does indexing decide what to skip?"*
121
+
122
+ The win is the screenshot showing the tool call panel — Claude asking `xindex_search` and getting real paths back. That's the image that sells MCP integration.
123
+
124
+ ## Cover image — Gemini prompt
125
+
126
+ Pick one direction:
127
+
128
+ All options: **light background, airy, meaningful, minimal.**
129
+
130
+ **Option A — constellation of meaning:**
131
+ > A minimalist editorial illustration on a soft off-white background. A small cluster of delicate paper-thin file cards floats in the center, connected by fine pastel threads that converge toward one highlighted card in gentle focus. Hints of pale blue, soft coral, and warm sand. Lots of negative space. No text. Flat 2D style with subtle grain. 16:9.
132
+
133
+ **Option B — finding the thread:**
134
+ > A light, airy illustration on a pale cream background. A single thin glowing line weaves through a loose scatter of abstract document shapes and lands on one, softly illuminating it. Muted pastels: sky blue, soft peach, mint. Calm, almost meditative. Generous negative space. No text. Minimal editorial style. 16:9.
135
+
136
+ **Option C — lens on meaning:**
137
+ > A clean, bright illustration on white. A simple line-drawn magnifying glass hovers over a gently organized pattern of small abstract symbols; the symbols inside the lens rearrange into a neat constellation while those outside stay scattered. Warm pastel accents — peach, sage, sky. Thin lines, soft shadows, plenty of whitespace. No text. 16:9.
138
+
139
+ My pick: **B** — "finding the thread" maps directly to what xindex does (one connection through the noise), reads well at thumbnail size, and stays quiet enough not to fight the headline.
@@ -0,0 +1,102 @@
1
+ # xindex — social posts & announcements
2
+
3
+ Links:
4
+ - **npm**: https://www.npmjs.com/package/xindex
5
+ - **Medium**: https://medium.com/@slavahatnuke/xindex-local-semantic-code-search-with-an-mcp-server-built-in-4a74c24d62b7
6
+ - **Launch tweet (pinned)**: https://x.com/slavahatnuke/status/2045214244367470721
7
+
8
+ ---
9
+
10
+ ## LinkedIn — announcement (main version)
11
+
12
+ > Just shipped **xindex** — a small tool I built to solve a problem I kept hitting: finding "the part that handles X" in unfamiliar codebases.
13
+ >
14
+ > grep matches text. xindex matches meaning. Fully local — your code never leaves your machine.
15
+ >
16
+ > It also runs as an MCP server, which means Claude Code (and any MCP-compatible assistant) can search your repo directly. The hallucinations drop. The round-trips drop. That's the part I'm most excited about.
17
+ >
18
+ > `npm i -g xindex`
19
+ >
20
+ > I wrote up the why, the how, and honest limitations here:
21
+ > 👉 https://medium.com/@slavahatnuke/xindex-local-semantic-code-search-with-an-mcp-server-built-in-4a74c24d62b7
22
+ >
23
+ > Would love feedback from anyone using Claude Code day-to-day — especially what breaks on your repo.
24
+ >
25
+ > #DeveloperTools #MCP #ClaudeCode #AI #OpenSource
26
+
27
+ ---
28
+
29
+ ## LinkedIn — shorter variant (2–3 lines)
30
+
31
+ > Shipped **xindex** — local semantic code search for your codebase, with an MCP server built in so Claude Code can use it directly.
32
+ >
33
+ > `npm i -g xindex` · write-up 👉 https://medium.com/@slavahatnuke/xindex-local-semantic-code-search-with-an-mcp-server-built-in-4a74c24d62b7
34
+ >
35
+ > Feedback welcome 🙏
36
+ >
37
+ > #DeveloperTools #MCP #ClaudeCode
38
+
39
+ ---
40
+
41
+ ## LinkedIn — longer narrative variant
42
+
43
+ > A few weeks ago I got tired of grepping through a codebase I didn't write, trying to find "the part that handles auth retries." `grep retry` gave me 200 matches. `grep auth` gave me 400. I knew what I wanted — I just couldn't spell it in one string.
44
+ >
45
+ > So I built **xindex**.
46
+ >
47
+ > It's a small tool that indexes a codebase and lets you search it by natural-language meaning. Fully local — nothing leaves your machine. And because it runs as an MCP server, Claude Code (or any MCP-compatible assistant) can call it directly to find relevant files without inventing paths.
48
+ >
49
+ > The assistant-integration part is what I'm most excited about. Semantic search is one of the highest-leverage tools you can hand to an AI working in an unfamiliar repo.
50
+ >
51
+ > `npm i -g xindex`
52
+ >
53
+ > Full write-up — why I built it, how it works, honest limitations:
54
+ > 👉 https://medium.com/@slavahatnuke/xindex-local-semantic-code-search-with-an-mcp-server-built-in-4a74c24d62b7
55
+ >
56
+ > If you use Claude Code day-to-day, I'd love to hear what breaks on your repo.
57
+ >
58
+ > #DeveloperTools #MCP #ClaudeCode #AI #OpenSource
59
+
60
+ ---
61
+
62
+ ## X / Twitter — reply to pinned tweet
63
+
64
+ Reply thread under the pinned launch tweet to extend its reach:
65
+
66
+ > Wrote up the full story on Medium — why I built it, how it works, and what it won't do.
67
+ >
68
+ > https://medium.com/@slavahatnuke/xindex-local-semantic-code-search-with-an-mcp-server-built-in-4a74c24d62b7
69
+
70
+ ---
71
+
72
+ ## X / Twitter — quote-tweet (24–48h after launch)
73
+
74
+ Quote your own pinned tweet to revive it:
75
+
76
+ > A few days in — some early feedback coming in. If you missed it, here's the write-up:
77
+ >
78
+ > https://medium.com/@slavahatnuke/xindex-local-semantic-code-search-with-an-mcp-server-built-in-4a74c24d62b7
79
+
80
+ ---
81
+
82
+ ## Publishing checklist
83
+
84
+ - [ ] Pin launch tweet on X
85
+ - [ ] Reply to pinned tweet with Medium link
86
+ - [ ] Post LinkedIn announcement (main version)
87
+ - [ ] Pin LinkedIn post to profile
88
+ - [ ] Drop Medium link in any relevant Slack / Discord communities you're in
89
+ - [ ] Submit Medium post to a publication (Better Programming / Level Up Coding / ILLUMINATION) if you're a contributor
90
+ - [ ] 48h later: quote-tweet own pinned tweet for a second wave
91
+
92
+ ---
93
+
94
+ ## Ongoing content ideas (post-launch)
95
+
96
+ For later, when you have something to say:
97
+
98
+ 1. **Demo GIF** — record `xindex-index` + `xindex-search` on a real repo; post as standalone tweet
99
+ 2. **"grep vs xindex" side-by-side** — the punchier tweet variant you considered, once you have usage to back it up
100
+ 3. **Claude Code screencast** — record Claude invoking `xindex_search` and answering a real question; post on X + LinkedIn
101
+ 4. **Lessons / numbers** — after a week: "xindex hit N installs, here's what I learned"
102
+ 5. **Feature posts** — as you add capabilities, short posts on each
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "xindex",
3
- "version": "1.0.0",
3
+ "version": "1.0.1",
4
4
  "description": "Local semantic code search — index codebase, search by meaning or keywords",
5
5
  "type": "module",
6
6
  "main": "xindex.ts",
@@ -1,79 +0,0 @@
1
- # Research: File Watching in Node.js (2026)
2
-
3
- ## Question 1: fs.watch recursive — platform support?
4
-
5
- **macOS** — native FSEvents backend, recursive works perfectly since early Node versions.
6
- **Windows** — native ReadDirectoryChangesW, recursive works since early Node versions.
7
- **Linux** — added in Node ~19 via [PR #45098](https://github.com/nodejs/node/pull/45098) (Oct 2022). Uses inotify (opens one fd per directory, not native recursive). Had race condition bug in Node 20.3.0 ([#48437](https://github.com/nodejs/node/issues/48437)), fixed in [PR #51406](https://github.com/nodejs/node/pull/51406). Also had crash-on-delete bug, fixed in [commit e7d0d80](https://github.com/nodejs/node/commit/e7d0d804b2).
8
-
9
- **Status:** Recursive fs.watch works on all three platforms in Node 22+. Linux implementation is stable after fixes.
10
-
11
- ## Question 2: chokidar vs fs.watch — still needed?
12
-
13
- **chokidar v5** (Nov 2025):
14
- - ESM-only, min Node 20, TypeScript rewrite
15
- - Deps reduced from 13 → 1
16
- - Still uses fs.watch as primary backend, normalizes events
17
- - Events: `add`, `addDir`, `change`, `unlink`, `unlinkDir`, `ready`
18
- - ~30M repos, de facto standard
19
- - API: event emitter pattern (`watcher.on("add", path => ...)`)
20
-
21
- **When chokidar adds value:**
22
- - Cross-platform consistency (normalizes all platform quirks)
23
- - Glob pattern matching (removed in v5 actually)
24
- - Handles edge cases: atomic writes, duplicate events, initial scan
25
- - `ready` event (know when initial scan is done)
26
-
27
- **When fs.watch is sufficient:**
28
- - Single platform or modern Node (22+)
29
- - Simple needs (just file paths + change type)
30
- - Already have debouncing infrastructure
31
- - Prefer async iterable over event emitter
32
-
33
- ## Question 3: @parcel/watcher and alternatives?
34
-
35
- **@parcel/watcher** — native C++ addon. Backends: FSEvents (macOS), inotify (Linux), ReadDirectoryChangesW (Windows). Most performant for large codebases. Heavy dep (native addon build). Vite considered switching to it from chokidar ([#12495](https://github.com/vitejs/vite/issues/12495)).
36
-
37
- **node-watch** — thin wrapper over fs.watch, adds recursive support for Linux. Lighter than chokidar.
38
-
39
- **watchpack** — webpack's watcher. Uses chokidar under the hood.
40
-
41
- None of these add significant value over chokidar or native fs.watch for our use case.
42
-
43
- ## Question 4: fs.watch known issues + best practices?
44
-
45
- **Issues:**
46
- - Duplicate events per single file save (editor writes temp → rename → delete)
47
- - Null filenames on some platforms/scenarios
48
- - "rename" event is ambiguous (create, delete, or rename)
49
- - No built-in debouncing
50
-
51
- **Best practices:**
52
- - Debounce: 50-200ms window to batch rapid events
53
- - Stat validation: after event, `stat()` to check if file exists and get mtime
54
- - Resource cleanup: always `watcher.close()` on shutdown
55
- - Path handling: `fs.watch` gives filename relative to watched dir, need `path.join`
56
-
57
- ## Decision: fs.watch for xindex
58
-
59
- **Recommendation: native `fs.watch`** with our own debouncing via streamx `batchTimed`.
60
-
61
- **Why:**
62
- 1. Zero new deps — project is private, macOS primary, Node 22+ assumed
63
- 2. Async iterable — `fs.watch` returns `AsyncIterable<FileChangeInfo>`, fits streamx architecture naturally (no adapter needed)
64
- 3. Debouncing covered — `batchTimed(20, 150)` already in streamx handles the duplicate event problem
65
- 4. Stat validation — simple: `stat()` after event, exists → index, throws → remove
66
- 5. Simpler shutdown — close watcher handle vs chokidar's async `.close()`
67
-
68
- **Tradeoff accepted:** more manual edge case handling (null filenames, dedup). Acceptable for a private tool with batchTimed already available.
69
-
70
- **If issues arise:** chokidar v5 is 1 dep away, same ESM/Node 20+ requirements, drop-in upgrade path. Not worth adding preemptively.
71
-
72
- ## Sources
73
-
74
- - [Node.js fs.watch recursive Linux PR #45098](https://github.com/nodejs/node/pull/45098)
75
- - [Node 20 recursive bug #48437](https://github.com/nodejs/node/issues/48437)
76
- - [Chokidar v5 README](https://github.com/paulmillr/chokidar/blob/main/README.md)
77
- - [Vite fs.watch discussion #12495](https://github.com/vitejs/vite/issues/12495)
78
- - [@parcel/watcher](https://github.com/parcel-bundler/watcher)
79
- - [fs.watch best practices](https://www.w3tutorials.net/blog/nodejs-fs-watch/)
@@ -1,129 +0,0 @@
1
- # MCP Tool Output Format for LLM Consumption
2
-
3
- **Question**: What output format should our xindex_search MCP tool use to return search results to an LLM?
4
-
5
- **Current state**: `JSON.stringify(results, null, 2)` — pretty-printed JSON with score, id, meta.keywords, meta.file (id and meta.file are redundant).
6
-
7
- ---
8
-
9
- ## Findings
10
-
11
- ### 1. Token efficiency benchmarks (ImprovingAgents, Oct 2025)
12
-
13
- **Nested data** — 1,000 questions, 3 models, 4 formats:
14
-
15
- | Format | Tokens | GPT-5 Nano | Gemini 2.5 Flash Lite |
16
- |----------|---------|------------|----------------------|
17
- | Markdown | 38,357 | 54.3% | 48.2% |
18
- | YAML | 42,477 | 62.1% | 51.9% |
19
- | JSON | 57,933 | 50.3% | 43.1% |
20
- | XML | 68,804 | 44.4% | 33.8% |
21
-
22
- Markdown uses **34% fewer tokens** than JSON. YAML has better accuracy but more tokens.
23
-
24
- **Flat/tabular data** — 11 formats, 1,000 queries, GPT-4.1-nano:
25
-
26
- | Format | Accuracy | Tokens | Efficiency |
27
- |----------------|----------|---------|------------|
28
- | Markdown-KV | 60.7% | 52,104 | Best accuracy |
29
- | Markdown Table | 51.9% | 25,140 | Best ratio |
30
- | JSON | 52.3% | 66,396 | Mediocre |
31
- | CSV | 44.3% | 19,524 | Cheapest but worst |
32
-
33
- For flat data (which our search results are), **Markdown-KV** gives best LLM comprehension. A numbered list with `key: value` pairs is effectively Markdown-KV.
34
-
35
- Sources: [Nested formats](https://www.improvingagents.com/blog/best-nested-data-format/), [Table formats](https://www.improvingagents.com/blog/best-input-data-format-for-llms/)
36
-
37
- ### 2. MCP spec guidance (June 2025)
38
-
39
- - `content` (TextContent) = what the LLM reads
40
- - `structuredContent` = machine-to-machine, optional
41
- - Spec's own example uses **plain text**: `"Current weather in New York:\nTemperature: 72°F\nConditions: Partly cloudy"`
42
- - If `outputSchema` is defined, SHOULD return both `structuredContent` AND serialized JSON in TextContent for backwards compat
43
-
44
- The spec explicitly shows plain text as the standard tool result format for LLM consumption.
45
-
46
- Source: [MCP Tools Spec](https://modelcontextprotocol.io/specification/2025-06-18/server/tools)
47
-
48
- ### 3. What popular MCP servers do
49
-
50
- | Server | Output format |
51
- |-------------|--------------|
52
- | Perplexity | AI-synthesized text + citation URLs |
53
- | Context7 | Plain text documentation snippets |
54
- | markdownify | Markdown (entire category exists for this) |
55
- | Elasticsearch | JSON (machine-oriented) |
56
-
57
- LLM-facing servers use text/markdown. Only machine-oriented servers use JSON.
58
-
59
- ### 4. JSON specifically degrades LLM reasoning
60
-
61
- - Aider benchmarks: JSON wrapping reduces code reasoning quality by 10-15% ([source](https://aider.chat/2024/08/14/code-in-json.html))
62
- - arxiv paper: frontier models top out at ~77% accuracy on JSON processing tasks ([source](https://arxiv.org/html/2510.15955v1))
63
- - OpenAI community: Markdown is 15% more token-efficient than JSON ([source](https://community.openai.com/t/markdown-is-15-more-token-efficient-than-json/841742))
64
-
65
- ### 5. TOON format (Nov 2025) — not recommended
66
-
67
- New token-optimized format. Mixed results: 73.9% on flat retrieval but **last place** (43.1%) on nested data. Immature ecosystem, no MCP support. Not applicable here.
68
-
69
- Source: [TOON benchmarks](https://www.improvingagents.com/blog/toon-benchmarks/)
70
-
71
- ### 6. Workato design guidelines
72
-
73
- - Return only necessary fields — avoid sending 200+ fields when 3 suffice
74
- - Preprocess/summarize large content before returning to LLM
75
- - Consider token efficiency — "excessive data can overwhelm the AI agent"
76
-
77
- Source: [Workato MCP Tool Design](https://docs.workato.com/en/mcp/mcp-server-tool-design.html)
78
-
79
- ---
80
-
81
- ## Analysis
82
-
83
- Our search results are **flat data** with 3 fields per result (score, file path, keywords). This is the simplest case:
84
-
85
- | Approach | Tokens/result | LLM quality | Fit |
86
- |----------|--------------|-------------|-----|
87
- | Pretty JSON (current) | ~55 | Worst — syntax overhead | Bad |
88
- | Compact JSON | ~22 | OK but cryptic keys | Meh |
89
- | Markdown numbered list | ~12 | Best — Markdown-KV pattern | Best |
90
- | TSV | ~15 | OK but less natural | OK |
91
-
92
- The markdown numbered list matches the **Markdown-KV** pattern that scored highest (60.7%) in flat data benchmarks. It's also **77% fewer tokens** than current JSON.
93
-
94
- Additional advantages:
95
- - File path is visually prominent (it's what the LLM acts on next)
96
- - Score at 2 decimals is sufficient ranking signal
97
- - Keywords give semantic context without opening the file
98
- - Zero structural noise (no braces, brackets, quotes, commas)
99
- - Matches how Perplexity/Context7 format their responses
100
-
101
- No significant trade-offs: we don't need machine-parseability (the consumer is always an LLM), and there's no nested data to worry about.
102
-
103
- ---
104
-
105
- ## Recommendation
106
-
107
- **Switch to markdown numbered list.**
108
-
109
- ```
110
- Search: "authentication flow" — 3 result(s)
111
-
112
- 1. src/components/auth.ts (0.87) — authentication, login, session, token
113
- 2. src/middleware/jwt.ts (0.81) — jwt, token, verify, middleware
114
- 3. src/routes/login.ts (0.74) — login, form, credentials, redirect
115
- ```
116
-
117
- Implementation in `mcpApp.ts`:
118
- ```ts
119
- const header = `Search: "${query}" — ${results.length} result(s)\n\n`;
120
- const lines = results.map((r, i) =>
121
- `${i + 1}. ${r.id} (${r.score.toFixed(2)}) — ${r.meta.keywords ?? ""}`
122
- );
123
- const text = header + lines.join("\n");
124
- return {content: [{type: "text" as const, text}]};
125
- ```
126
-
127
- Empty case: `No results for "${query}"` — avoids confusing the model with an empty list.
128
-
129
- **Future consideration**: Add `outputSchema` + `structuredContent` when clients start using it, but keep TextContent as the primary format for LLM consumption.
package/.ai/task/INDEX.md DELETED
@@ -1,12 +0,0 @@
1
- # Tasks
2
-
3
- - [xindex-mcp — MCP Server for Semantic Code Search](task.2026-04-10-xindex-mcp.md) — wrap xindex as MCP server so Claude Code can search codebase
4
- - [Directory-based Indexing with Async Streams](task.2026-04-10-dir-indexing.md) — accept files/dirs, recursive walk with .gitignore, index via streamx pipeline
5
- - [xindex-watch — Continuous Indexing](task.2026-04-10-watch-indexing.md) — new entry point: index all + watch for changes continuously via merged stream
6
- - [Object Store — Separate Meta from Vectra](task.2026-04-10-object-store.md) — store meta as JSON files in .xindex/objects/, vectra keeps only vectors
7
- - [Line-level Clustering](task.2026-04-10-line-clustering.md) — recursive bisection to split files into semantic blocks, index as file:fromLine-toLine
8
-
9
- - [Search Config — Keyword Ignore & Inline Snippets](task.2026-04-10-search-config.md) — `.xindex.json` config for ignoring noisy keywords + inlining small code clusters in results
10
- - [Cluster Config — Move ClusterLines defaults to .xindex.json](task.2026-04-10-cluster-config.md) — repo-level clustering params (`threshold`, `minLines`, `maxDepth`) instead of hardcoded defaults
11
-
12
- See [done/INDEX.md](done/INDEX.md) for completed tasks.