npm - xindex - Versions diffs - 1.0.0 → 1.0.1 - Mend

xindex 1.0.0 → 1.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

package/.ai/research/.gitkeep +0 -0
package/.ai/task/.gitkeep +0 -0
package/README.md +54 -89
package/media/MEDIUM.md +139 -0
package/media/SOCIAL.md +102 -0
package/package.json +1 -1
package/.ai/research/2026-04-10-file-watching.md +0 -79
package/.ai/research/2026-04-10-mcp-output-format.md +0 -129
package/.ai/task/INDEX.md +0 -12
package/.ai/task/done/INDEX.md +0 -3
package/.ai/task/done/task.2026-04-09-local-ai-research-protos.log.md +0 -98
package/.ai/task/done/task.2026-04-09-local-ai-research-protos.md +0 -102
package/.ai/task/task.2026-04-10-cluster-config.log.md +0 -19
package/.ai/task/task.2026-04-10-cluster-config.md +0 -118
package/.ai/task/task.2026-04-10-dir-indexing.log.md +0 -8
package/.ai/task/task.2026-04-10-dir-indexing.md +0 -92
package/.ai/task/task.2026-04-10-line-clustering.log.md +0 -50
package/.ai/task/task.2026-04-10-line-clustering.md +0 -176
package/.ai/task/task.2026-04-10-object-store.log.md +0 -7
package/.ai/task/task.2026-04-10-object-store.md +0 -81
package/.ai/task/task.2026-04-10-search-config.log.md +0 -46
package/.ai/task/task.2026-04-10-search-config.md +0 -274
package/.ai/task/task.2026-04-10-watch-indexing.log.md +0 -32
package/.ai/task/task.2026-04-10-watch-indexing.md +0 -101
package/.ai/task/task.2026-04-10-xindex-mcp.log.md +0 -5
package/.ai/task/task.2026-04-10-xindex-mcp.md +0 -92
package/.ai/task/task.2026-04-10-xindex-mcp.report.md +0 -113

package/.ai/research/.gitkeep ADDED Viewed

File without changes

package/.ai/task/.gitkeep ADDED Viewed

File without changes

package/README.md CHANGED Viewed

@@ -1,37 +1,16 @@
 # xindex
-Local semantic code search. Index your codebase, search by meaning — no cloud, no API keys. Also runs as an MCP server so Claude Code (and other MCP clients) can search your repo directly.
+**grep matches text. xindex matches meaning.**
-## Features
-- **Local** — everything runs on your machine; embeddings cached on disk
-- **Semantic search** — natural-language queries, not just substring match
-- **MCP server** — plug into Claude Code via `.mcp.json`
-- **Watch mode** — keep the index warm while you code
-- **Gitignore-aware** — respects `.gitignore` + custom ignore rules
-- **Zero config** — works with defaults; `.xindex.json` is optional
-## How it fits together
-```
-  your repo                      xindex
-  ─────────                      ──────
-  *.ts / *.md  ──►  walk  ──►  keywords  ──►  embed  ──►  .xindex/
-  .gitignore                                              (vectra index)
-                                                              ▲
-  CLI / MCP  ◄──  search  ◄──  embed query  ◄──  "question"  ┘
-```
+Local semantic code search for your codebase — plus an MCP server so Claude Code (and any MCP client) can search your repo directly. Fully local, no cloud, no API keys.
 ## Install
 ```bash
-git clone <repo-url> xindex
-cd xindex
-yarn install        # or npm install
-npm link            # makes xindex-* binaries + xindex-mcp available on PATH
+npm i -g xindex
 ```
-Requires Node.js. First run downloads the embedding model (`all-MiniLM-L6-v2`, ~25MB) — after that, fully offline.
+First run downloads a small embedding model (`all-MiniLM-L6-v2`, ~23MB). After that, fully offline.
 ## Quick start
@@ -43,7 +22,35 @@ xindex-search "where is auth handled"  # ask a question
 Index lives in `./.xindex/` — add it to `.gitignore`.
-## CLI
+## Use with Claude Code (MCP)
+Drop this into `.mcp.json` at your project root:
+```json
+{
+  "mcpServers": {
+    "xindex": {
+      "command": "xindex-mcp",
+      "args": []
+    }
+  }
+}
+```
+Open the project in Claude Code — it picks up the xindex MCP server and can call `xindex_search`, `xindex_index`, and `xindex_reset` directly. Fewer hallucinations, fewer round-trips.
+## Features
+- **Local** — everything runs on your machine; embeddings cached on disk
+- **Semantic search** — natural-language queries, not substring match
+- **MCP server** — plugs into Claude Code via `.mcp.json`
+- **Watch mode** — keeps the index warm while you code
+- **Gitignore-aware** — respects `.gitignore` + custom ignore rules
+- **Zero config** — works with defaults; `.xindex.json` is optional
+---
+## CLI reference
 All five binaries run from any directory; they index/search the current working directory.
@@ -81,24 +88,7 @@ xindex-mcp --watch-disabled  # no watch
 xindex-mcp --watch-dir=./src # watch a specific dir
 ```
-## MCP (Claude Code & others)
-Drop this into `.mcp.json` at your project root:
-```json
-{
-  "mcpServers": {
-    "xindex": {
-      "command": "xindex-mcp",
-      "args": []
-    }
-  }
-}
-```
-Requires `xindex-mcp` on PATH (`npm link` inside this repo does it). If you'd rather pin to an absolute path, use `/absolute/path/to/bin/xindex-mcp`.
-### Tools exposed
+## MCP tools
 | Tool | What it does | Input |
 |------|--------------|-------|
@@ -108,14 +98,6 @@ Requires `xindex-mcp` on PATH (`npm link` inside this repo does it). If you'd ra
 Note: CLI `xindex-search` defaults to 10 results; MCP `xindex_search` defaults to 5.
-### Typical Claude Code flow
-1. Commit `.mcp.json` to your repo.
-2. Open the project in Claude Code — it picks up the xindex MCP server.
-3. Ask it to call `xindex_index` once with `inputs: ["."]`.
-4. From then on, it uses `xindex_search` for natural-language lookups.
-5. Watch mode keeps the index fresh as you edit.
 ## Configuration
 ### `.xindex.json` (optional)
@@ -140,57 +122,29 @@ Created automatically. Contains:
 **Always gitignore it.**
-### `.gitignore`
+### `.gitignore` minimum
-Minimum:
 ```
 .xindex
 node_modules/
 ```
-## Examples
-### Index + search from the terminal
-```bash
-cd my-app
-xindex-index .
-xindex-search "rate limiter implementation"
-```
-### Keep the index warm while coding
-```bash
-xindex-watch .
-# edit files in another terminal; index updates incrementally
-# Ctrl+C to stop
-```
-### Use from Claude Code via MCP
-```bash
-# one-time setup
-cd xindex && npm link
-# in your project
-echo '{"mcpServers":{"xindex":{"command":"xindex-mcp","args":[]}}}' > .mcp.json
+## How it fits together
-# open project in Claude Code — xindex tools are available
 ```
-### Run MCP without watching
-```bash
-xindex-mcp --watch-disabled
+  your repo                      xindex
+  ─────────                      ──────
+  *.ts / *.md  ──►  walk  ──►  keywords  ──►  embed  ──►  .xindex/
+  .gitignore                                              (vectra index)
+                                                              ▲
+  CLI / MCP  ◄──  search  ◄──  embed query  ◄──  "question"  ┘
 ```
-You control when reindexing happens via explicit `xindex_index` calls.
 ## Project layout
 ```
 apps/        entry points (run.*.ts) + app composers (IndexApp, SearchApp, McpApp, ...)
-bin/         shebang wrappers invoked by npm/yarn and .mcp.json
+bin/         shebang wrappers invoked by npm and .mcp.json
 componets/   shared building blocks: config, walk, watch, embed, vectra adapter, logger
 features/    domain operations: indexContent, searchIndex, removeContent, resetIndex
 packages/    small internal libs (streamx, fun)
@@ -201,6 +155,17 @@ packages/    small internal libs (streamx, fun)
 See [CLAUDE.md](CLAUDE.md) for contributor conventions (HOF pattern, logger rules, task workflow).
+## Development
+Working on xindex itself? Clone and link:
+```bash
+git clone <repo-url> xindex
+cd xindex
+yarn install   # or npm install
+npm link       # exposes xindex-* binaries from your working copy
+```
 ## License
 MIT

package/media/MEDIUM.md ADDED Viewed

@@ -0,0 +1,139 @@
+# xindex — local semantic code search, with an MCP server built in
+> Index your repo, search by meaning, no cloud. Works with Claude Code out of the box.
+---
+## The problem
+You land in a 50,000-line codebase someone else wrote. You need "the part that handles auth retries."
+```
+grep -r "retry" .
+```
+200 matches. Most unrelated. You try `auth`. 400 matches.
+You know what you want. You just can't spell it in one string.
+## Why I built this
+I wanted grep's ergonomics with a real search engine's semantics — but running entirely on my laptop. Cloud code search solves the meaning problem, but I'm not uploading half-finished private repos to anyone's servers. IDE search works inside files you've already found. Neither answers *"where in this entire repo is X done?"* without the round-trip to the cloud.
+So I built a small tool that indexes a repo once and lets me ask it questions. Then the MCP ecosystem happened, and the same index became the single most useful thing I could hand to Claude Code.
+## What xindex is
+A small CLI that indexes your codebase and lets you search it by natural-language meaning. It also runs as an MCP server, so Claude Code (or any MCP client) can call it directly.
+- **Local** — nothing leaves your machine
+- **Semantic** — natural-language queries, not substring matches
+- **MCP built in** — four lines of JSON to wire into Claude Code
+- **Watch mode** — keeps the index fresh as you edit
+- **Gitignore-aware** — plus your own rules
+What it's *not*: not a grep replacement (exact strings — grep wins), not code intelligence (no symbols/refs — your IDE wins). It's a focused semantic index. Nothing more.
+## 30 seconds, end to end
+```bash
+npm i -g xindex
+cd my-project
+xindex-index .
+xindex-search "rate limiter logic"
+```
+First run downloads a ~23MB embedding model (one time). Then you get ranked file paths back — enough to jump straight to the right place.
+## The part I'm actually excited about
+Semantic search is one of the highest-leverage tools you can give an AI assistant in an unfamiliar repo.
+Drop this into `.mcp.json`:
+```json
+{
+  "mcpServers": {
+    "xindex": {
+      "command": "xindex-mcp",
+      "args": []
+    }
+  }
+}
+```
+Open the project in Claude Code. Ask about your codebase. Watch Claude call `xindex_search` and come back with real file references instead of invented ones.
+The hallucinations drop. The round-trips drop. That alone was worth shipping.
+## What I'm not pretending
+v1 of a tool I built for myself:
+- First run needs network (model download)
+- One repo at a time; no cross-repo search
+- No AST awareness — works on keywords, not structure
+- Quality depends on descriptive names — `x1`, `foo`, `tmp` won't index well
+If it breaks on your repo, that's the feedback I want most.
+## Try it
+```bash
+npm i -g xindex
+```
+- **npm**: [npmjs.com/package/xindex](https://www.npmjs.com/package/xindex)
+What I'd love to hear: does the quality hold up on your repo? What does it fumble? What would make it genuinely useful day-to-day?
+DM me directly:
+- **X / Twitter**: [@slavahatnuke](https://x.com/slavahatnuke)
+- **LinkedIn**: [slava-xatnuk](https://www.linkedin.com/in/slava-xatnuk/)
+---
+<!-- APPENDIX — not part of the post, for screenshots only -->
+## Screenshot queries (for the post)
+### Terminal demo (`xindex-search` output)
+Good candidates — run inside the xindex repo itself to get clean, relatable results:
+1. `xindex-search "where is the MCP server registered"`
+2. `xindex-search "file watcher debounce"`
+3. `xindex-search "how keywords are extracted"`
+4. `xindex-search "gitignore handling"`
+5. `xindex-search "how is the vector index stored"`
+Pick the one with the cleanest output. #1 tends to match well because "MCP" is distinctive.
+### Claude Code + xindex (mid-post screenshot)
+Open Claude Code in a project with xindex wired into `.mcp.json`. Ask it something where you can visibly see it invoke `xindex_search`:
+1. *"Where does xindex handle the file watcher lock?"*
+2. *"Show me how the MCP server wires up its tools."*
+3. *"How does indexing decide what to skip?"*
+The win is the screenshot showing the tool call panel — Claude asking `xindex_search` and getting real paths back. That's the image that sells MCP integration.
+## Cover image — Gemini prompt
+Pick one direction:
+All options: **light background, airy, meaningful, minimal.**
+**Option A — constellation of meaning:**
+> A minimalist editorial illustration on a soft off-white background. A small cluster of delicate paper-thin file cards floats in the center, connected by fine pastel threads that converge toward one highlighted card in gentle focus. Hints of pale blue, soft coral, and warm sand. Lots of negative space. No text. Flat 2D style with subtle grain. 16:9.
+**Option B — finding the thread:**
+> A light, airy illustration on a pale cream background. A single thin glowing line weaves through a loose scatter of abstract document shapes and lands on one, softly illuminating it. Muted pastels: sky blue, soft peach, mint. Calm, almost meditative. Generous negative space. No text. Minimal editorial style. 16:9.
+**Option C — lens on meaning:**
+> A clean, bright illustration on white. A simple line-drawn magnifying glass hovers over a gently organized pattern of small abstract symbols; the symbols inside the lens rearrange into a neat constellation while those outside stay scattered. Warm pastel accents — peach, sage, sky. Thin lines, soft shadows, plenty of whitespace. No text. 16:9.
+My pick: **B** — "finding the thread" maps directly to what xindex does (one connection through the noise), reads well at thumbnail size, and stays quiet enough not to fight the headline.

package/media/SOCIAL.md ADDED Viewed

@@ -0,0 +1,102 @@
+# xindex — social posts & announcements
+Links:
+- **npm**: https://www.npmjs.com/package/xindex
+- **Medium**: https://medium.com/@slavahatnuke/xindex-local-semantic-code-search-with-an-mcp-server-built-in-4a74c24d62b7
+- **Launch tweet (pinned)**: https://x.com/slavahatnuke/status/2045214244367470721
+---
+## LinkedIn — announcement (main version)
+> Just shipped **xindex** — a small tool I built to solve a problem I kept hitting: finding "the part that handles X" in unfamiliar codebases.
+>
+> grep matches text. xindex matches meaning. Fully local — your code never leaves your machine.
+>
+> It also runs as an MCP server, which means Claude Code (and any MCP-compatible assistant) can search your repo directly. The hallucinations drop. The round-trips drop. That's the part I'm most excited about.
+>
+> `npm i -g xindex`
+>
+> I wrote up the why, the how, and honest limitations here:
+> 👉 https://medium.com/@slavahatnuke/xindex-local-semantic-code-search-with-an-mcp-server-built-in-4a74c24d62b7
+>
+> Would love feedback from anyone using Claude Code day-to-day — especially what breaks on your repo.
+>
+> #DeveloperTools #MCP #ClaudeCode #AI #OpenSource
+---
+## LinkedIn — shorter variant (2–3 lines)
+> Shipped **xindex** — local semantic code search for your codebase, with an MCP server built in so Claude Code can use it directly.
+>
+> `npm i -g xindex` · write-up 👉 https://medium.com/@slavahatnuke/xindex-local-semantic-code-search-with-an-mcp-server-built-in-4a74c24d62b7
+>
+> Feedback welcome 🙏
+>
+> #DeveloperTools #MCP #ClaudeCode
+---
+## LinkedIn — longer narrative variant
+> A few weeks ago I got tired of grepping through a codebase I didn't write, trying to find "the part that handles auth retries." `grep retry` gave me 200 matches. `grep auth` gave me 400. I knew what I wanted — I just couldn't spell it in one string.
+>
+> So I built **xindex**.
+>
+> It's a small tool that indexes a codebase and lets you search it by natural-language meaning. Fully local — nothing leaves your machine. And because it runs as an MCP server, Claude Code (or any MCP-compatible assistant) can call it directly to find relevant files without inventing paths.
+>
+> The assistant-integration part is what I'm most excited about. Semantic search is one of the highest-leverage tools you can hand to an AI working in an unfamiliar repo.
+>
+> `npm i -g xindex`
+>
+> Full write-up — why I built it, how it works, honest limitations:
+> 👉 https://medium.com/@slavahatnuke/xindex-local-semantic-code-search-with-an-mcp-server-built-in-4a74c24d62b7
+>
+> If you use Claude Code day-to-day, I'd love to hear what breaks on your repo.
+>
+> #DeveloperTools #MCP #ClaudeCode #AI #OpenSource
+---
+## X / Twitter — reply to pinned tweet
+Reply thread under the pinned launch tweet to extend its reach:
+> Wrote up the full story on Medium — why I built it, how it works, and what it won't do.
+>
+> https://medium.com/@slavahatnuke/xindex-local-semantic-code-search-with-an-mcp-server-built-in-4a74c24d62b7
+---
+## X / Twitter — quote-tweet (24–48h after launch)
+Quote your own pinned tweet to revive it:
+> A few days in — some early feedback coming in. If you missed it, here's the write-up:
+>
+> https://medium.com/@slavahatnuke/xindex-local-semantic-code-search-with-an-mcp-server-built-in-4a74c24d62b7
+---
+## Publishing checklist
+- [ ] Pin launch tweet on X
+- [ ] Reply to pinned tweet with Medium link
+- [ ] Post LinkedIn announcement (main version)
+- [ ] Pin LinkedIn post to profile
+- [ ] Drop Medium link in any relevant Slack / Discord communities you're in
+- [ ] Submit Medium post to a publication (Better Programming / Level Up Coding / ILLUMINATION) if you're a contributor
+- [ ] 48h later: quote-tweet own pinned tweet for a second wave
+---
+## Ongoing content ideas (post-launch)
+For later, when you have something to say:
+1. **Demo GIF** — record `xindex-index` + `xindex-search` on a real repo; post as standalone tweet
+2. **"grep vs xindex" side-by-side** — the punchier tweet variant you considered, once you have usage to back it up
+3. **Claude Code screencast** — record Claude invoking `xindex_search` and answering a real question; post on X + LinkedIn
+4. **Lessons / numbers** — after a week: "xindex hit N installs, here's what I learned"
+5. **Feature posts** — as you add capabilities, short posts on each

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "xindex",
-  "version": "1.0.0",
+  "version": "1.0.1",
   "description": "Local semantic code search — index codebase, search by meaning or keywords",
   "type": "module",
   "main": "xindex.ts",

package/.ai/research/2026-04-10-file-watching.md DELETED Viewed

@@ -1,79 +0,0 @@
-# Research: File Watching in Node.js (2026)
-## Question 1: fs.watch recursive — platform support?
-**macOS** — native FSEvents backend, recursive works perfectly since early Node versions.
-**Windows** — native ReadDirectoryChangesW, recursive works since early Node versions.
-**Linux** — added in Node ~19 via [PR #45098](https://github.com/nodejs/node/pull/45098) (Oct 2022). Uses inotify (opens one fd per directory, not native recursive). Had race condition bug in Node 20.3.0 ([#48437](https://github.com/nodejs/node/issues/48437)), fixed in [PR #51406](https://github.com/nodejs/node/pull/51406). Also had crash-on-delete bug, fixed in [commit e7d0d80](https://github.com/nodejs/node/commit/e7d0d804b2).
-**Status:** Recursive fs.watch works on all three platforms in Node 22+. Linux implementation is stable after fixes.
-## Question 2: chokidar vs fs.watch — still needed?
-**chokidar v5** (Nov 2025):
-- ESM-only, min Node 20, TypeScript rewrite
-- Deps reduced from 13 → 1
-- Still uses fs.watch as primary backend, normalizes events
-- Events: `add`, `addDir`, `change`, `unlink`, `unlinkDir`, `ready`
-- ~30M repos, de facto standard
-- API: event emitter pattern (`watcher.on("add", path => ...)`)
-**When chokidar adds value:**
-- Cross-platform consistency (normalizes all platform quirks)
-- Glob pattern matching (removed in v5 actually)
-- Handles edge cases: atomic writes, duplicate events, initial scan
-- `ready` event (know when initial scan is done)
-**When fs.watch is sufficient:**
-- Single platform or modern Node (22+)
-- Simple needs (just file paths + change type)
-- Already have debouncing infrastructure
-- Prefer async iterable over event emitter
-## Question 3: @parcel/watcher and alternatives?
-**@parcel/watcher** — native C++ addon. Backends: FSEvents (macOS), inotify (Linux), ReadDirectoryChangesW (Windows). Most performant for large codebases. Heavy dep (native addon build). Vite considered switching to it from chokidar ([#12495](https://github.com/vitejs/vite/issues/12495)).
-**node-watch** — thin wrapper over fs.watch, adds recursive support for Linux. Lighter than chokidar.
-**watchpack** — webpack's watcher. Uses chokidar under the hood.
-None of these add significant value over chokidar or native fs.watch for our use case.
-## Question 4: fs.watch known issues + best practices?
-**Issues:**
-- Duplicate events per single file save (editor writes temp → rename → delete)
-- Null filenames on some platforms/scenarios
-- "rename" event is ambiguous (create, delete, or rename)
-- No built-in debouncing
-**Best practices:**
-- Debounce: 50-200ms window to batch rapid events
-- Stat validation: after event, `stat()` to check if file exists and get mtime
-- Resource cleanup: always `watcher.close()` on shutdown
-- Path handling: `fs.watch` gives filename relative to watched dir, need `path.join`
-## Decision: fs.watch for xindex
-**Recommendation: native `fs.watch`** with our own debouncing via streamx `batchTimed`.
-**Why:**
-1. Zero new deps — project is private, macOS primary, Node 22+ assumed
-2. Async iterable — `fs.watch` returns `AsyncIterable<FileChangeInfo>`, fits streamx architecture naturally (no adapter needed)
-3. Debouncing covered — `batchTimed(20, 150)` already in streamx handles the duplicate event problem
-4. Stat validation — simple: `stat()` after event, exists → index, throws → remove
-5. Simpler shutdown — close watcher handle vs chokidar's async `.close()`
-**Tradeoff accepted:** more manual edge case handling (null filenames, dedup). Acceptable for a private tool with batchTimed already available.
-**If issues arise:** chokidar v5 is 1 dep away, same ESM/Node 20+ requirements, drop-in upgrade path. Not worth adding preemptively.
-## Sources
-- [Node.js fs.watch recursive Linux PR #45098](https://github.com/nodejs/node/pull/45098)
-- [Node 20 recursive bug #48437](https://github.com/nodejs/node/issues/48437)
-- [Chokidar v5 README](https://github.com/paulmillr/chokidar/blob/main/README.md)
-- [Vite fs.watch discussion #12495](https://github.com/vitejs/vite/issues/12495)
-- [@parcel/watcher](https://github.com/parcel-bundler/watcher)
-- [fs.watch best practices](https://www.w3tutorials.net/blog/nodejs-fs-watch/)

package/.ai/research/2026-04-10-mcp-output-format.md DELETED Viewed

@@ -1,129 +0,0 @@
-# MCP Tool Output Format for LLM Consumption
-**Question**: What output format should our xindex_search MCP tool use to return search results to an LLM?
-**Current state**: `JSON.stringify(results, null, 2)` — pretty-printed JSON with score, id, meta.keywords, meta.file (id and meta.file are redundant).
----
-## Findings
-### 1. Token efficiency benchmarks (ImprovingAgents, Oct 2025)
-**Nested data** — 1,000 questions, 3 models, 4 formats:
-| Format   | Tokens  | GPT-5 Nano | Gemini 2.5 Flash Lite |
-|----------|---------|------------|----------------------|
-| Markdown | 38,357  | 54.3%      | 48.2%                |
-| YAML     | 42,477  | 62.1%      | 51.9%                |
-| JSON     | 57,933  | 50.3%      | 43.1%                |
-| XML      | 68,804  | 44.4%      | 33.8%                |
-Markdown uses **34% fewer tokens** than JSON. YAML has better accuracy but more tokens.
-**Flat/tabular data** — 11 formats, 1,000 queries, GPT-4.1-nano:
-| Format         | Accuracy | Tokens  | Efficiency |
-|----------------|----------|---------|------------|
-| Markdown-KV    | 60.7%    | 52,104  | Best accuracy |
-| Markdown Table | 51.9%    | 25,140  | Best ratio |
-| JSON           | 52.3%    | 66,396  | Mediocre |
-| CSV            | 44.3%    | 19,524  | Cheapest but worst |
-For flat data (which our search results are), **Markdown-KV** gives best LLM comprehension. A numbered list with `key: value` pairs is effectively Markdown-KV.
-Sources: [Nested formats](https://www.improvingagents.com/blog/best-nested-data-format/), [Table formats](https://www.improvingagents.com/blog/best-input-data-format-for-llms/)
-### 2. MCP spec guidance (June 2025)
-- `content` (TextContent) = what the LLM reads
-- `structuredContent` = machine-to-machine, optional
-- Spec's own example uses **plain text**: `"Current weather in New York:\nTemperature: 72°F\nConditions: Partly cloudy"`
-- If `outputSchema` is defined, SHOULD return both `structuredContent` AND serialized JSON in TextContent for backwards compat
-The spec explicitly shows plain text as the standard tool result format for LLM consumption.
-Source: [MCP Tools Spec](https://modelcontextprotocol.io/specification/2025-06-18/server/tools)
-### 3. What popular MCP servers do
-| Server       | Output format |
-|-------------|--------------|
-| Perplexity  | AI-synthesized text + citation URLs |
-| Context7    | Plain text documentation snippets |
-| markdownify | Markdown (entire category exists for this) |
-| Elasticsearch | JSON (machine-oriented) |
-LLM-facing servers use text/markdown. Only machine-oriented servers use JSON.
-### 4. JSON specifically degrades LLM reasoning
-- Aider benchmarks: JSON wrapping reduces code reasoning quality by 10-15% ([source](https://aider.chat/2024/08/14/code-in-json.html))
-- arxiv paper: frontier models top out at ~77% accuracy on JSON processing tasks ([source](https://arxiv.org/html/2510.15955v1))
-- OpenAI community: Markdown is 15% more token-efficient than JSON ([source](https://community.openai.com/t/markdown-is-15-more-token-efficient-than-json/841742))
-### 5. TOON format (Nov 2025) — not recommended
-New token-optimized format. Mixed results: 73.9% on flat retrieval but **last place** (43.1%) on nested data. Immature ecosystem, no MCP support. Not applicable here.
-Source: [TOON benchmarks](https://www.improvingagents.com/blog/toon-benchmarks/)
-### 6. Workato design guidelines
-- Return only necessary fields — avoid sending 200+ fields when 3 suffice
-- Preprocess/summarize large content before returning to LLM
-- Consider token efficiency — "excessive data can overwhelm the AI agent"
-Source: [Workato MCP Tool Design](https://docs.workato.com/en/mcp/mcp-server-tool-design.html)
----
-## Analysis
-Our search results are **flat data** with 3 fields per result (score, file path, keywords). This is the simplest case:
-| Approach | Tokens/result | LLM quality | Fit |
-|----------|--------------|-------------|-----|
-| Pretty JSON (current) | ~55 | Worst — syntax overhead | Bad |
-| Compact JSON | ~22 | OK but cryptic keys | Meh |
-| Markdown numbered list | ~12 | Best — Markdown-KV pattern | Best |
-| TSV | ~15 | OK but less natural | OK |
-The markdown numbered list matches the **Markdown-KV** pattern that scored highest (60.7%) in flat data benchmarks. It's also **77% fewer tokens** than current JSON.
-Additional advantages:
-- File path is visually prominent (it's what the LLM acts on next)
-- Score at 2 decimals is sufficient ranking signal
-- Keywords give semantic context without opening the file
-- Zero structural noise (no braces, brackets, quotes, commas)
-- Matches how Perplexity/Context7 format their responses
-No significant trade-offs: we don't need machine-parseability (the consumer is always an LLM), and there's no nested data to worry about.
----
-## Recommendation
-**Switch to markdown numbered list.**
-```
-Search: "authentication flow" — 3 result(s)
-1. src/components/auth.ts (0.87) — authentication, login, session, token
-2. src/middleware/jwt.ts (0.81) — jwt, token, verify, middleware
-3. src/routes/login.ts (0.74) — login, form, credentials, redirect
-```
-Implementation in `mcpApp.ts`:
-```ts
-const header = `Search: "${query}" — ${results.length} result(s)\n\n`;
-const lines = results.map((r, i) =>
-    `${i + 1}. ${r.id} (${r.score.toFixed(2)}) — ${r.meta.keywords ?? ""}`
-);
-const text = header + lines.join("\n");
-return {content: [{type: "text" as const, text}]};
-```
-Empty case: `No results for "${query}"` — avoids confusing the model with an empty list.
-**Future consideration**: Add `outputSchema` + `structuredContent` when clients start using it, but keep TextContent as the primary format for LLM consumption.

package/.ai/task/INDEX.md DELETED Viewed

@@ -1,12 +0,0 @@
-# Tasks
-- [xindex-mcp — MCP Server for Semantic Code Search](task.2026-04-10-xindex-mcp.md) — wrap xindex as MCP server so Claude Code can search codebase
-- [Directory-based Indexing with Async Streams](task.2026-04-10-dir-indexing.md) — accept files/dirs, recursive walk with .gitignore, index via streamx pipeline
-- [xindex-watch — Continuous Indexing](task.2026-04-10-watch-indexing.md) — new entry point: index all + watch for changes continuously via merged stream
-- [Object Store — Separate Meta from Vectra](task.2026-04-10-object-store.md) — store meta as JSON files in .xindex/objects/, vectra keeps only vectors
-- [Line-level Clustering](task.2026-04-10-line-clustering.md) — recursive bisection to split files into semantic blocks, index as file:fromLine-toLine
-- [Search Config — Keyword Ignore & Inline Snippets](task.2026-04-10-search-config.md) — `.xindex.json` config for ignoring noisy keywords + inlining small code clusters in results
-- [Cluster Config — Move ClusterLines defaults to .xindex.json](task.2026-04-10-cluster-config.md) — repo-level clustering params (`threshold`, `minLines`, `maxDepth`) instead of hardcoded defaults
-See [done/INDEX.md](done/INDEX.md) for completed tasks.