xindex 1.0.0 → 1.0.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.ai/research/.gitkeep +0 -0
- package/.ai/task/.gitkeep +0 -0
- package/README.md +54 -89
- package/apps/run.search.ts +0 -3
- package/componets/index/formatSearchResults.ts +2 -2
- package/media/MEDIUM.md +139 -0
- package/media/SOCIAL.md +102 -0
- package/package.json +1 -1
- package/.ai/research/2026-04-10-file-watching.md +0 -79
- package/.ai/research/2026-04-10-mcp-output-format.md +0 -129
- package/.ai/task/INDEX.md +0 -12
- package/.ai/task/done/INDEX.md +0 -3
- package/.ai/task/done/task.2026-04-09-local-ai-research-protos.log.md +0 -98
- package/.ai/task/done/task.2026-04-09-local-ai-research-protos.md +0 -102
- package/.ai/task/task.2026-04-10-cluster-config.log.md +0 -19
- package/.ai/task/task.2026-04-10-cluster-config.md +0 -118
- package/.ai/task/task.2026-04-10-dir-indexing.log.md +0 -8
- package/.ai/task/task.2026-04-10-dir-indexing.md +0 -92
- package/.ai/task/task.2026-04-10-line-clustering.log.md +0 -50
- package/.ai/task/task.2026-04-10-line-clustering.md +0 -176
- package/.ai/task/task.2026-04-10-object-store.log.md +0 -7
- package/.ai/task/task.2026-04-10-object-store.md +0 -81
- package/.ai/task/task.2026-04-10-search-config.log.md +0 -46
- package/.ai/task/task.2026-04-10-search-config.md +0 -274
- package/.ai/task/task.2026-04-10-watch-indexing.log.md +0 -32
- package/.ai/task/task.2026-04-10-watch-indexing.md +0 -101
- package/.ai/task/task.2026-04-10-xindex-mcp.log.md +0 -5
- package/.ai/task/task.2026-04-10-xindex-mcp.md +0 -92
- package/.ai/task/task.2026-04-10-xindex-mcp.report.md +0 -113
|
@@ -1,176 +0,0 @@
|
|
|
1
|
-
# Task: Line-level clustering for block-granular search
|
|
2
|
-
|
|
3
|
-
## Context
|
|
4
|
-
|
|
5
|
-
**Current state**: xindex indexes one vector per file. The `id` is the file path, keywords are extracted from the entire file content, and search returns file-level matches. This is too coarse — a 500-line file with mixed concerns returns as a single hit with no indication of *where* in the file the match is.
|
|
6
|
-
|
|
7
|
-
**User's idea**: split files into semantically coherent blocks (clusters of lines), then index each block separately so search returns `file:fromLine-toLine` references.
|
|
8
|
-
|
|
9
|
-
**Approach — extend existing pipeline with recursive bisection**:
|
|
10
|
-
1. Keep existing file-level indexing intact — `indexContent(filePath, keywords, meta)` runs first, unchanged
|
|
11
|
-
2. After file-level index: split file content into lines
|
|
12
|
-
3. Bisect into 2 halves → extract keywords for each → embed → compute cosine similarity (dot product of normalized vectors)
|
|
13
|
-
4. If similarity is high (≥ 0.6) → cohesive, no clustering needed. If low → 2 separate clusters.
|
|
14
|
-
5. Recurse: split each cluster into 2 again, test overlap, stop when clusters are cohesive or hit limits (max depth 4 → up to 16 clusters, min 5 lines per cluster)
|
|
15
|
-
6. If only 1 cluster (whole file is cohesive) → skip clustering, file-level entry is enough
|
|
16
|
-
7. If 2+ clusters → index each in persistent Vectra as `<file>:<fromLine>-<toLine>` alongside the file-level entry
|
|
17
|
-
8. Write a manifest at `<file>::manifest` tracking cluster IDs for cleanup on re-index
|
|
18
|
-
9. Both file-level and cluster-level entries coexist — search may return both
|
|
19
|
-
|
|
20
|
-
**Key files (change targets)**:
|
|
21
|
-
- `componets/index/indexFileContent.ts` — **main change site**: currently calls `indexContent(id, keywords, meta)` once per file. Will call `clusterLines` then loop over clusters.
|
|
22
|
-
- `componets/index/handleFileEvent.ts` — calls `removeContent(path)` on file change. Must delete all clusters for a file, not just one ID.
|
|
23
|
-
- `componets/index/indexContent.ts` — low-level: embeds + upserts one item. No change needed — called per cluster.
|
|
24
|
-
- `componets/index/removeContent.ts` — low-level: deletes one item. No change needed — called per cluster ID.
|
|
25
|
-
- `componets/index/searchContentIndex.ts` — returns `IIndexRecord{score, id, meta}`. No change needed — `id` becomes `file:1-27` naturally.
|
|
26
|
-
- `componets/index/indexMeta.ts` — `IIndexMeta{keywords, id}`. Add `type` tag, add `IClusterMeta`, `IFileManifest` using `IType<>` tagged union.
|
|
27
|
-
- `componets/index/objectStore.ts` — stores `IIndexMeta` as JSON, keyed by MD5(id). Needs a manifest entry per file to track cluster IDs.
|
|
28
|
-
- `componets/index/contentIndexDriver.ts` — wires components together. Must construct `ClusterLines`, `IndexFileContent`, `RemoveFileContent` inside. Currently `IndexFileContent` is constructed by callers.
|
|
29
|
-
- `componets/buildComponents.ts` — top-level builder. Must return `indexFileContent` + `removeFileContent` from driver.
|
|
30
|
-
- `apps/indexApp.ts` — bulk indexer. Calls `indexFileContent` directly via stream (no `HandleFileEvent`). Needs `removeFileContent` for cleanup.
|
|
31
|
-
- `apps/run.index.ts`, `apps/run.watch.ts`, `apps/run.mcp.ts` — entry points. Currently construct `IndexFileContent` manually. Will use driver-provided version.
|
|
32
|
-
- `componets/index/vectraIndex.ts` — creates `LocalIndex(path)`. No change.
|
|
33
|
-
- `componets/llm/embed.ts` — MiniLM-L6 embeddings, returns `number[]`. No change.
|
|
34
|
-
- `test-vectra-memory.ts` — proved VirtualFileStorage works for in-memory cosine queries.
|
|
35
|
-
|
|
36
|
-
**Raw notes**: recursive split → 2 → 4 → 8 → 16 hard stop. Overlap by keywords via embedding cosine similarity. Final clusters get indexed in persistent store with line references. MCP query returns lines.
|
|
37
|
-
|
|
38
|
-
## Goal
|
|
39
|
-
|
|
40
|
-
Extend the existing indexing pipeline with a `ClusterLines` component (HOF pattern) that takes file content, splits it into semantically coherent line clusters using recursive bisection with embedding cosine similarity, and returns cluster descriptors `{fromLine, toLine, content, keywords}[]`. The existing file-level index stays intact — clustering adds block-level entries alongside it.
|
|
41
|
-
|
|
42
|
-
## Diagram
|
|
43
|
-
|
|
44
|
-
```
|
|
45
|
-
handleFileEvent (file change/add)
|
|
46
|
-
│
|
|
47
|
-
├── removeFileContent(path) ◄── clean ALL old data first
|
|
48
|
-
│ ├── removeContent(path) delete file-level vectra + meta
|
|
49
|
-
│ └── read manifest(path::manifest) if exists:
|
|
50
|
-
│ ├── removeContent(path:1-10) delete each cluster
|
|
51
|
-
│ ├── removeContent(path:11-25)
|
|
52
|
-
│ └── objectStore.remove(manifest)
|
|
53
|
-
│
|
|
54
|
-
└── indexFileContent(path, text) ◄── create ALL new data
|
|
55
|
-
│
|
|
56
|
-
├── EXISTING: file-level index (unchanged)
|
|
57
|
-
│ extractKeywords + cleanUpKeywords(text)
|
|
58
|
-
│ indexContent(path, keywords, {keywords, id: path})
|
|
59
|
-
│ ├── embed(keywords) → vector
|
|
60
|
-
│ ├── vectra.upsertItem({id: path, vector})
|
|
61
|
-
│ └── objectStore.write(path, meta)
|
|
62
|
-
│
|
|
63
|
-
├── NEW: cluster-level index (extension)
|
|
64
|
-
│ clusterLines(lines, path)
|
|
65
|
-
│ │
|
|
66
|
-
│ ▼
|
|
67
|
-
│ ┌─────────────────┐
|
|
68
|
-
│ │ Split in half │
|
|
69
|
-
│ │ lines[0..n/2] │
|
|
70
|
-
│ │ lines[n/2..n] │
|
|
71
|
-
│ └────────┬────────┘
|
|
72
|
-
│ │
|
|
73
|
-
│ ▼
|
|
74
|
-
│ ┌──────────────────────────┐
|
|
75
|
-
│ │ Extract keywords each │
|
|
76
|
-
│ │ Embed keywords → vec │
|
|
77
|
-
│ │ cosine(vecA, vecB) │
|
|
78
|
-
│ └────────┬─────────────────┘
|
|
79
|
-
│ │
|
|
80
|
-
│ sim ≥ 0.6? ──yes──► 1 cluster (leaf)
|
|
81
|
-
│ │
|
|
82
|
-
│ no → recurse each half ◄── depth ≤ 4, min 5 lines
|
|
83
|
-
│ │
|
|
84
|
-
│ ▼
|
|
85
|
-
│ clusters[] = {fromLine, toLine, content, keywords}[]
|
|
86
|
-
│
|
|
87
|
-
│ clusters.length ≤ 1? → SKIP (file entry is enough)
|
|
88
|
-
│
|
|
89
|
-
│ clusters.length > 1? → for each cluster:
|
|
90
|
-
│ indexContent(id="path:12-45", cluster.keywords, clusterMeta)
|
|
91
|
-
│
|
|
92
|
-
└── objectStore.write(path::manifest, {clusterIds})
|
|
93
|
-
|
|
94
|
-
Three key types in store:
|
|
95
|
-
path → file-level entry (vectra + objectStore)
|
|
96
|
-
path:1-10 → cluster entry (vectra + objectStore)
|
|
97
|
-
path::manifest → {type:"manifest", clusterIds} (objectStore only)
|
|
98
|
-
```
|
|
99
|
-
|
|
100
|
-
## Steps
|
|
101
|
-
|
|
102
|
-
### 1. ClusterLines component — NEW `componets/index/clusterLines.ts`
|
|
103
|
-
1. **Cosine helper** — `cosine(a: number[], b: number[]): number` — dot product of two normalized vectors. Pure function, no deps.
|
|
104
|
-
2. **HOF factory** — `ClusterLines({embed, extractKeywords, cleanUpKeywords, threshold, minLines, maxDepth})` returns `IClusterLines(lines: string[], file: string) → Promise<ILineCluster[]>`. Defaults: threshold=0.6, minLines=5, maxDepth=4.
|
|
105
|
-
3. **ILineCluster type** — `{fromLine: number, toLine: number, content: string, keywords: string}`. `fromLine`/`toLine` are 1-based line numbers.
|
|
106
|
-
4. **Recursive bisection** — split lines at midpoint → join each half → `extractKeywords` + `cleanUpKeywords` → `embed` each → `cosine(vecA, vecB)`. If sim ≥ threshold → leaf cluster. If sim < threshold → recurse on each half.
|
|
107
|
-
5. **Guards** — `lines.length ≤ minLines` or `depth ≥ maxDepth` → leaf. Empty lines → return `[]`. Either half has no keywords → leaf.
|
|
108
|
-
|
|
109
|
-
### 2. Extend metadata — MODIFY `componets/index/indexMeta.ts` + `objectStore.ts`
|
|
110
|
-
1. **Tag IIndexMeta** — add `type: "meta"` field using `IType<>` pattern: `IType<{type: "meta", keywords: string, id: string}>`. Breaking change — all constructors must add `type: "meta"`.
|
|
111
|
-
2. **Add IClusterMeta type** — `IType<{type: "cluster", keywords: string, id: string, fromLine: number, toLine: number}>`. Cluster-level entries with line ranges.
|
|
112
|
-
3. **Add IFileManifest type** — `IType<{type: "manifest", id: string, clusterIds: string[]}>`. Stored at key `filePath::manifest` in object store.
|
|
113
|
-
4. **IStoreEntry union** — `IIndexMeta | IClusterMeta | IFileManifest`. Discriminated by `type` field.
|
|
114
|
-
5. **Widen objectStore types** — `IObjectStore.write`/`read` accept/return `IStoreEntry`.
|
|
115
|
-
6. **Update indexContent.ts** — widen `meta` param from `IIndexMeta` to `IIndexMeta | IClusterMeta`.
|
|
116
|
-
7. **Update searchContentIndex.ts** — narrow `IStoreEntry` by `type` when reading results from object store.
|
|
117
|
-
|
|
118
|
-
### 3. RemoveFileContent — NEW `componets/index/removeFileContent.ts`
|
|
119
|
-
1. **HOF factory** — `RemoveFileContent({removeContent, objectStore})` returns `IRemoveFileContent(filePath: string) => Promise<void>`.
|
|
120
|
-
2. **Deletes all layers** — (a) `removeContent(filePath)` to delete file-level vectra item + meta. (b) Read manifest at `filePath::manifest` → if exists, `removeContent(clusterId)` for each → `objectStore.remove(manifestKey)`. (c) All deletes wrapped in try/catch — missing entries are fine (first-time index, no clusters).
|
|
121
|
-
|
|
122
|
-
### 4. Update indexFileContent — MODIFY `componets/index/indexFileContent.ts`
|
|
123
|
-
1. **Add deps** — `{extractKeywords, cleanUpKeywords, indexContent, clusterLines, objectStore}`. Existing deps stay — file-level index needs `extractKeywords`/`cleanUpKeywords`.
|
|
124
|
-
2. **File-level index (EXISTING, now tagged)** — `extractKeywords(content)` → `cleanUpKeywords` → `indexContent(id, keywords, {type: "meta", keywords, id})`. Runs first, always.
|
|
125
|
-
3. **Cluster-level index (NEW, extension)** — `content.split("\n")` → `clusterLines(lines, id)` → if `clusters.length ≤ 1` → skip (file is cohesive). If `clusters.length > 1` → for each cluster: `indexContent(\`${id}:${fromLine}-${toLine}\`, cluster.keywords, {type: "cluster", ...})`.
|
|
126
|
-
4. **Write manifest** — after all clusters indexed, `objectStore.write(id + "::manifest", {type: "manifest", id, clusterIds})`.
|
|
127
|
-
|
|
128
|
-
### 5. Wire through driver + builder — MODIFY `contentIndexDriver.ts` + `buildComponents.ts`
|
|
129
|
-
1. **contentIndexDriver.ts** — instantiate `ClusterLines({embed, extractKeywords, cleanUpKeywords})`. Construct `IndexFileContent({extractKeywords, cleanUpKeywords, indexContent, clusterLines, objectStore})` inside driver (currently constructed by callers). Construct `RemoveFileContent({removeContent, objectStore})`. Add `indexFileContent` + `removeFileContent` to `IContentIndexDriver`.
|
|
130
|
-
2. **buildComponents.ts** — destructure `indexFileContent` + `removeFileContent` from `ContentIndexDriver`. Return them. Callers no longer construct `IndexFileContent` themselves.
|
|
131
|
-
|
|
132
|
-
### 6. Update callers — MODIFY `run.*.ts` + `handleFileEvent.ts` + `indexApp.ts`
|
|
133
|
-
1. **run.index.ts, run.watch.ts, run.mcp.ts** — remove `IndexFileContent(...)` construction. Get `indexFileContent` + `removeFileContent` from `BuildComponents()`.
|
|
134
|
-
2. **handleFileEvent.ts** — replace `removeContent` dep with `removeFileContent`. On `FileEventType.index`: `removeFileContent(path)` first (clean old data), then `indexFileContent(path, text)` (creates file entry + cluster entries). On `FileEventType.remove`: `removeFileContent(path)`.
|
|
135
|
-
3. **indexApp.ts** — currently calls `indexFileContent(id, text)` directly via stream pipeline (no `HandleFileEvent`). Add `removeFileContent` dep. Call `removeFileContent(id)` before `indexFileContent(id, text)` in the `map` callback — otherwise old clusters linger when cluster boundaries change on re-index. Update `IndexApp({walkFiles, indexFileContent, removeFileContent, log})`.
|
|
136
|
-
4. **Import paths** — `run.index.ts` imports `IndexFileContent` from `componets/index/indexFileContent.js`. After moving construction inside driver, remove this import. Same for `run.watch.ts` and `run.mcp.ts`.
|
|
137
|
-
|
|
138
|
-
### 7. Test end-to-end
|
|
139
|
-
1. **Unit test clusterLines** — feed a file with 2 distinct sections (imports+types vs. implementation), verify ≥2 clusters with correct 1-based line ranges.
|
|
140
|
-
2. **Integration test** — index a multi-concern file, query for a specific concept, verify search returns both `file.ts` (file-level) and `file.ts:12-45` (cluster-level).
|
|
141
|
-
3. **Re-index test** — modify file, re-index, verify old clusters deleted + new ones created.
|
|
142
|
-
4. **Cohesive file test** — index a small/uniform file, verify only file-level entry exists (no clusters, no manifest).
|
|
143
|
-
|
|
144
|
-
## Decisions
|
|
145
|
-
|
|
146
|
-
- **Extend, don't replace** — existing file-level indexing stays intact. Clustering is an additional step that runs after. Both levels coexist in the index.
|
|
147
|
-
- **1 cluster = skip** — if the file is cohesive (clustering returns 1 cluster = whole file), no cluster entries are created. File-level entry is enough.
|
|
148
|
-
- **Embedding cosine similarity** for bisection (not Jaccard). Jaccard only matches exact keyword strings — `fetchUser` and `getUser` would score 0% overlap despite being the same concern. Embeddings capture meaning. Cost is acceptable: MiniLM-L6 is local, ~30 embed calls per file at max depth, ~50-100ms total.
|
|
149
|
-
- **Cosine computation**: Option A — direct dot product (3-line helper, vectors already normalized). Fallback to Option B (in-memory Vectra via `VirtualFileStorage`) if direct cosine proves insufficient.
|
|
150
|
-
- **Similarity threshold**: start at 0.55–0.70, tune empirically. Try 0.6 as default.
|
|
151
|
-
- **Min cluster size**: 3–5 lines. Use 5 as default, configurable.
|
|
152
|
-
- **Three tagged types in store** (using `IType<>` pattern): `IIndexMeta{type:"meta"}` at `filePath`, `IClusterMeta{type:"cluster"}` at `filePath:fromLine-toLine`, `IFileManifest{type:"manifest"}` at `filePath::manifest`. All separate keys, discriminated by `type`.
|
|
153
|
-
- **Cleanup on re-index**: `removeFileContent` deletes file-level entry, then reads manifest to delete all cluster entries, then deletes manifest itself. Graceful on missing data.
|
|
154
|
-
- **Move IndexFileContent inside driver** — currently constructed by callers in `run.*.ts`. Moving inside `contentIndexDriver.ts` consolidates wiring since the driver already has all deps.
|
|
155
|
-
|
|
156
|
-
## Research: existing NPM packages
|
|
157
|
-
|
|
158
|
-
- **semantic-chunking** (jparkerweb, v2.4.4) — splits text into sentences, embeds each with ONNX model, groups by cosine similarity. Sentence-level, not line-level. Uses its own ONNX pipeline, not BYOE.
|
|
159
|
-
- **semantic-chunker** (johnhenry) — BYOE approach, bring your own embedding function. More flexible. Could plug in our MiniLM-L6 embed.
|
|
160
|
-
- **LangChain RecursiveCharacterTextSplitter** — recursive splitting by character/token boundaries, not semantic. 2026 benchmarks show 512-token recursive splitting at 69% accuracy — good baseline but not meaning-aware.
|
|
161
|
-
- **NAACL 2025 finding**: fixed 200-word chunks match or beat semantic chunking for general RAG. But for *code* with mixed concerns in one file, semantic splitting should outperform fixed-size.
|
|
162
|
-
- **Verdict**: existing packages target prose (sentence-level). Our use case is code (line-level, preserve line boundaries for references). Custom recursive bisection with our existing embed pipeline is the right call — simpler than adapting a prose chunker to respect line boundaries.
|
|
163
|
-
|
|
164
|
-
## Edge Cases
|
|
165
|
-
|
|
166
|
-
- **Small files (≤ 5 lines)** — return as single cluster, no splitting attempted.
|
|
167
|
-
- **Empty files** — file-level entry is still indexed (existing pipeline runs first). Clustering returns `[]`, no cluster entries created.
|
|
168
|
-
- **Files with uniform content** (e.g., all imports) — cosine similarity stays high at every split, returns 1 cluster. Expected behavior.
|
|
169
|
-
- **Binary/non-text files** — already filtered upstream by the file walker. Not a concern here.
|
|
170
|
-
- **Legacy index data** — files indexed before this change won't have manifests. On re-index, no old clusters to delete — just index fresh.
|
|
171
|
-
|
|
172
|
-
## Open Questions
|
|
173
|
-
|
|
174
|
-
- **Keyword extraction quality**: current keywords come from compromise NLP + keyword-extractor. May need tuning for code (variable names, imports, function signatures).
|
|
175
|
-
- **Threshold tuning**: need to test 0.55 vs 0.60 vs 0.70 on real project files to find the sweet spot.
|
|
176
|
-
- ~~**Object store dual use**~~ — resolved: three tagged types (`IIndexMeta`, `IClusterMeta`, `IFileManifest`) discriminated by `type` field, stored at separate keys. Union `IStoreEntry = IIndexMeta | IClusterMeta | IFileManifest`.
|
|
@@ -1,7 +0,0 @@
|
|
|
1
|
-
### 2026-04-10 — Task created
|
|
2
|
-
|
|
3
|
-
- Scouted: vectra currently stores vector + IIndexMeta (keywords, file) together
|
|
4
|
-
- User wants to separate: vectra for vectors only, .xindex/objects/ for meta JSON
|
|
5
|
-
- Hash-based path: md5(id) → xx/yy/xxyyzz.json
|
|
6
|
-
- Need to update indexContent, searchContentIndex, resetIndex, contentIndexDriver
|
|
7
|
-
- New components: objectStore (read/write/clear), indexStructure (manage .xindex/ dirs)
|
|
@@ -1,81 +0,0 @@
|
|
|
1
|
-
# Task: Object Store — Separate Meta Storage from Vectra
|
|
2
|
-
|
|
3
|
-
## Context
|
|
4
|
-
|
|
5
|
-
Currently vectra stores both vectors AND metadata (`{keywords, file}`) in the same index. Vectra is good for semantic search, not for storage. Goal: split storage into two layers:
|
|
6
|
-
|
|
7
|
-
- **`.xindex/semantic/`** — vectra stores only vectors + id (for search)
|
|
8
|
-
- **`.xindex/objects/`** — file-based JSON store for meta objects (for storage/retrieval)
|
|
9
|
-
|
|
10
|
-
**Current state:**
|
|
11
|
-
- `indexContent.ts` — embeds content, upserts `{id, vector, metadata: IIndexMeta}` into vectra
|
|
12
|
-
- `searchContentIndex.ts` — queries vectra, reads `r.item.metadata as IIndexMeta`
|
|
13
|
-
- `resetIndex.ts` — `deleteIndex()` + `createIndex()` on vectra only
|
|
14
|
-
- `IIndexMeta = {keywords: string, file: string}`
|
|
15
|
-
- Index path: `.xindex` (single vectra folder)
|
|
16
|
-
|
|
17
|
-
**New structure:**
|
|
18
|
-
```
|
|
19
|
-
.xindex/
|
|
20
|
-
├── semantic/ ← vectra (vectors + id only, minimal meta)
|
|
21
|
-
└── objects/ ← JSON files keyed by hash of id
|
|
22
|
-
└── xx/
|
|
23
|
-
└── yy/
|
|
24
|
-
└── xxyyzz.json ← {keywords, file, ...}
|
|
25
|
-
```
|
|
26
|
-
|
|
27
|
-
## Goal
|
|
28
|
-
|
|
29
|
-
Introduce an object store layer that writes `IIndexMeta` as JSON files in `.xindex/objects/`, remove metadata from vectra (keep only vector + id), and decorate `indexContent` and `searchContentIndex` to read/write both layers.
|
|
30
|
-
|
|
31
|
-
## Diagram
|
|
32
|
-
|
|
33
|
-
```
|
|
34
|
-
INDEX PIPELINE:
|
|
35
|
-
file → extractKeywords → cleanUp → keywords
|
|
36
|
-
│
|
|
37
|
-
├── [1] embed(keywords) → vector
|
|
38
|
-
│ └── vectra.upsert({id, vector}) → .xindex/semantic/
|
|
39
|
-
│
|
|
40
|
-
└── [2] objectStore.write(id, meta) → .xindex/objects/xx/yy/xxyyzz.json
|
|
41
|
-
{keywords, file}
|
|
42
|
-
|
|
43
|
-
SEARCH PIPELINE:
|
|
44
|
-
query → extractKeywords → cleanUp → embed → vector
|
|
45
|
-
│
|
|
46
|
-
├── [1] vectra.query(vector, limit) → [{score, id}]
|
|
47
|
-
│
|
|
48
|
-
└── [2] objectStore.read(id) → IIndexMeta
|
|
49
|
-
↓
|
|
50
|
-
[{score, id, meta}]
|
|
51
|
-
|
|
52
|
-
RESET:
|
|
53
|
-
[1] vectra.deleteIndex + createIndex → .xindex/semantic/ wiped
|
|
54
|
-
[2] rm -rf .xindex/objects/ → objects wiped
|
|
55
|
-
```
|
|
56
|
-
|
|
57
|
-
## Steps
|
|
58
|
-
|
|
59
|
-
### 1. Object Store HOF
|
|
60
|
-
- Create `componets/index/objectStore.ts` — `ObjectStore({basePath}): IObjectStore`
|
|
61
|
-
- `write(id, meta)` — hash id (md5 → hex), split into `xx/yy/xxyyzz`, `mkdir -p`, write JSON
|
|
62
|
-
- `read(id)` — hash id, read JSON, parse as `IIndexMeta`
|
|
63
|
-
- `clear()` — rm -rf basePath, recreate empty dir
|
|
64
|
-
|
|
65
|
-
### 2. Update Index Structure
|
|
66
|
-
- Create `componets/index/indexStructure.ts` — `IndexStructure({basePath}): IIndexStructure`
|
|
67
|
-
- Manages `.xindex/` top-level: ensures `semantic/` and `objects/` dirs exist
|
|
68
|
-
- Returns paths: `{semanticPath, objectsPath}`
|
|
69
|
-
- Used by `contentIndexDriver` at init
|
|
70
|
-
|
|
71
|
-
### 3. Decorate Index/Search
|
|
72
|
-
- Update `IndexContent` — upsert vector+id to vectra (no meta), write meta to objectStore
|
|
73
|
-
- Update `SearchContentIndex` — query vectra for `{score, id}[]`, then `objectStore.read(id)` for each result to attach meta
|
|
74
|
-
- Update `ResetIndex` — call both `vectra.deleteIndex/createIndex` and `objectStore.clear()`
|
|
75
|
-
- Update `ContentIndexDriver` — pass `semanticPath` to `VectraIndex`, create `ObjectStore({basePath: objectsPath})`
|
|
76
|
-
|
|
77
|
-
## Open Questions
|
|
78
|
-
|
|
79
|
-
- Hash function: `crypto.createHash('md5')` from Node built-in — fast enough, no deps. Or use simpler hash?
|
|
80
|
-
- Should objectStore support partial updates (upsert) or always overwrite?
|
|
81
|
-
- Should search batch-read objects or read one by one per result?
|
|
@@ -1,46 +0,0 @@
|
|
|
1
|
-
### 2026-04-10
|
|
2
|
-
|
|
3
|
-
- Task created from user notes
|
|
4
|
-
- Scouted codebase via xindex search (indexed 167 files)
|
|
5
|
-
- `.xindex.json` exists but empty `{}`, no config loading anywhere
|
|
6
|
-
- `CleanUpKeywords` at `componets/keywords/cleanUpKeywords.ts:8` — HOF takes `{maxNgrams, minLength}`. Add `ignoreKeywords` here.
|
|
7
|
-
- `SearchContentIndex` at `componets/index/searchContentIndex.ts:12` — search pipeline, uses `cleanUpKeywords` on query. Ignore list propagates automatically.
|
|
8
|
-
- `IClusterMeta` at `componets/index/indexMeta.ts:11` — has `fromLine`/`toLine` for reading snippet lines
|
|
9
|
-
- MCP tool at `apps/mcpApp.ts:34` — `xindex_search` schema has `{query, limit}` only. Add snippet params.
|
|
10
|
-
- CLI at `apps/run.search.ts:23-31` — formats results with score + keywords, no snippets
|
|
11
|
-
- `BuildComponents` at `componets/buildComponents.ts:6` — wires everything, no config loading. Config loads here.
|
|
12
|
-
- `ContentIndexDriver` at `componets/index/contentIndexDriver.ts:27` — passes `cleanUpKeywords` to `ClusterLines` and `SearchContentIndex`
|
|
13
|
-
- Entry points: `apps/run.mcp.ts:19` (MCP), `apps/run.search.ts:8` (CLI) — both call `BuildComponents()`
|
|
14
|
-
- User wants explicit config names: `ignoreKeywords`, `snippetLines`, `snippetResults`
|
|
15
|
-
|
|
16
|
-
**Clarification round — decisions:**
|
|
17
|
-
- Defaults confirmed: `snippetResults: 3`, `snippetLines: 7`
|
|
18
|
-
- `ignoreKeywords`: exact strings, case-insensitive. No globs/patterns.
|
|
19
|
-
- Ignore at **index time** — re-index + MCP restart after config change is acceptable. One-time setup, review in 3mo.
|
|
20
|
-
- File-level results (whole file, no cluster) also get snippets if file total lines ≤ `snippetLines`
|
|
21
|
-
- Task finalized
|
|
22
|
-
|
|
23
|
-
**Round 2 — user feedback during detail expansion:**
|
|
24
|
-
- Renamed `snippetLines` → `maxSnippetLines`, `snippetResults` → `maxSnippetResults` (user preference for explicit names)
|
|
25
|
-
- Added `ignoreFiles` feature: gitignore-style glob patterns in `.xindex.json` to exclude files from indexing. Reuses existing `ignore` package already in `walkFiles.ts:3` and `watchFiles.ts`
|
|
26
|
-
- Expanded task from 3x3 to 4x3 to accommodate file ignore list as separate step
|
|
27
|
-
- Traced all WalkFiles/WatchFiles consumers: `run.mcp.ts`, `run.index.ts`, `run.watch.ts` — all need `ignoreFiles` plumbed
|
|
28
|
-
- Task ready for implementation
|
|
29
|
-
|
|
30
|
-
**Round 3 — consistency check (7 findings, all fixed):**
|
|
31
|
-
- [Missing] `.xindex.json` is optional — added to Decisions + diagram label
|
|
32
|
-
- [Drift] Diagram only showed WalkFiles — added WatchFiles
|
|
33
|
-
- [Mismatch] Step 2.3 duplicated validation from 1.2 — removed 2.3, kept in 1.2 only
|
|
34
|
-
- [Mismatch] `console.warn` in LoadConfig violates project `ILogger` pattern — added `log: ILogger` dep to LoadConfig and BuildComponents
|
|
35
|
-
- [Drift] Files Changed table had tentative "(if it creates its own)" for run.index.ts — made definitive
|
|
36
|
-
- [Inconsistency] Step 4.2 parsed fromLine/toLine from ID string — uses `meta.fromLine`/`meta.toLine` directly now
|
|
37
|
-
- [Missing] Step 1.3 vague "WalkFiles consumers" — listed all 5 specific construction sites (run.mcp.ts:18,30, run.index.ts:10, run.watch.ts:13,14)
|
|
38
|
-
|
|
39
|
-
**Round 4 — implementation:**
|
|
40
|
-
- Implemented all 12 files (3 new, 9 modified) + run.reset.ts (missed in plan, also calls BuildComponents)
|
|
41
|
-
- Phase 1: config type + loadConfig HOF
|
|
42
|
-
- Phase 2: cleanUpKeywords ignoreSet, walkFiles + watchFiles ignoreFiles
|
|
43
|
-
- Phase 3: readSnippet HOF
|
|
44
|
-
- Phase 4: buildComponents wiring ({log} param, config loading, return config)
|
|
45
|
-
- Phase 5: all entry points updated (run.mcp, run.search, run.index, run.watch, run.reset)
|
|
46
|
-
- Verified: keyword ignore filters noisy words, file ignore excludes rnd/**, snippets show for small results (top 3, ≤7 lines)
|
|
@@ -1,274 +0,0 @@
|
|
|
1
|
-
# Task: Search Result Config — Keyword Ignore, File Ignore & Inline Code Snippets
|
|
2
|
-
|
|
3
|
-
## Context
|
|
4
|
-
|
|
5
|
-
Three improvements to xindex, all configurable via `.xindex.json`:
|
|
6
|
-
|
|
7
|
-
1. **Keyword ignore list** — exclude noisy keywords at index time (case-insensitive exact match). Improves grouping relevance. Requires re-index after config change — acceptable as one-time setup.
|
|
8
|
-
2. **File ignore list** — gitignore-style glob patterns to exclude files from indexing. Same semantics as `.gitignore` but defined in `.xindex.json`. Applied in `WalkFiles` and `WatchFiles` alongside existing `.gitignore` rules.
|
|
9
|
-
3. **Inline code snippets** — when a search result is small (≤ N lines), include actual source code in the output. Configurable via `.xindex.json` defaults + MCP tool parameter overrides.
|
|
10
|
-
|
|
11
|
-
**Current state:**
|
|
12
|
-
- `.xindex.json` exists but is empty `{}` — file is optional, may not exist at all
|
|
13
|
-
- Keywords: `compromise` NLP → `keyword-extractor` cleanup in `componets/keywords/cleanUpKeywords.ts`
|
|
14
|
-
- File walking: `componets/walkFiles.ts` uses `ignore` package for `.gitignore` rules; `componets/watchFiles.ts` has its own `loadGitignore` + `ignore()` at line 22-31
|
|
15
|
-
- Search results: 1-line summaries only (`1. path:from-to (score) — keywords`)
|
|
16
|
-
- Cluster metadata stores `fromLine`/`toLine` in `componets/index/indexMeta.ts:11` (`IClusterMeta`)
|
|
17
|
-
- MCP `xindex_search` accepts `query` and `limit` only
|
|
18
|
-
- No config loading exists
|
|
19
|
-
- Entry points that create `WalkFiles`/`WatchFiles`: `run.mcp.ts:18,30`, `run.index.ts:10`, `run.watch.ts:13-14`
|
|
20
|
-
|
|
21
|
-
**Decisions:**
|
|
22
|
-
- `.xindex.json` is **optional** — missing file → all defaults, no error
|
|
23
|
-
- `ignoreKeywords`: exact strings, case-insensitive. No globs/patterns.
|
|
24
|
-
- `ignoreFiles`: gitignore-style glob patterns (reuses existing `ignore` package)
|
|
25
|
-
- `maxSnippetResults: 3`, `maxSnippetLines: 7` — confirmed defaults
|
|
26
|
-
- Ignore list applied at **index time** — re-index + MCP restart required after config change. One-time setup, review in 3mo.
|
|
27
|
-
- File-level results (no cluster) **also get snippets** if total file lines ≤ `maxSnippetLines`
|
|
28
|
-
- Config field names are explicit: `ignoreKeywords`, `ignoreFiles`, `maxSnippetLines`, `maxSnippetResults`
|
|
29
|
-
|
|
30
|
-
## Diagram
|
|
31
|
-
|
|
32
|
-
```
|
|
33
|
-
.xindex.json (optional) MCP xindex_search
|
|
34
|
-
┌──────────────────────────┐ ┌──────────────────────────────┐
|
|
35
|
-
│ ignoreKeywords: [...] │ │ query, limit │
|
|
36
|
-
│ ignoreFiles: [...] │ │ maxSnippetResults: 3 │ ← override
|
|
37
|
-
│ maxSnippetLines: 7 │ │ maxSnippetLines: 7 │ ← override
|
|
38
|
-
│ maxSnippetResults: 3 │ └──────┬───────────────────────┘
|
|
39
|
-
└──────┬───────────────────┘ │
|
|
40
|
-
│ │
|
|
41
|
-
├─ ignoreFiles ────┐ │
|
|
42
|
-
│ ▼ │
|
|
43
|
-
│ WalkFiles + WatchFiles │
|
|
44
|
-
│ (skip matching paths) │
|
|
45
|
-
│ │
|
|
46
|
-
├─ ignoreKeywords ─┐ │
|
|
47
|
-
│ ▼ │
|
|
48
|
-
│ CleanUpKeywords │
|
|
49
|
-
│ (index time) │
|
|
50
|
-
│ │
|
|
51
|
-
└─ snippet config ────────────────────────┤
|
|
52
|
-
▼
|
|
53
|
-
Format results
|
|
54
|
-
├─ cluster ≤ maxSnippetLines? → readSnippet
|
|
55
|
-
└─ file ≤ maxSnippetLines? → readSnippet
|
|
56
|
-
|
|
57
|
-
Data flow (indexing):
|
|
58
|
-
walkFiles(inputs) ← ignoreFiles applied here (NEW)
|
|
59
|
-
→ readFile
|
|
60
|
-
→ ExtractKeywords (compromise NLP)
|
|
61
|
-
→ CleanUpKeywords (keyword-extractor + ignoreKeywords filter) ← NEW
|
|
62
|
-
→ embed → vectra upsert + objectStore write
|
|
63
|
-
|
|
64
|
-
Data flow (search):
|
|
65
|
-
query
|
|
66
|
-
→ extractKeywords → cleanUpKeywords → embed → vectra query
|
|
67
|
-
→ filter by scoreThreshold → objectStore.read for each hit
|
|
68
|
-
→ format results + readSnippet for top N small results ← NEW
|
|
69
|
-
```
|
|
70
|
-
|
|
71
|
-
## Steps
|
|
72
|
-
|
|
73
|
-
### 1. Config schema & loading
|
|
74
|
-
|
|
75
|
-
- **1.1 Define config type** — create `componets/config/xindexConfig.ts`
|
|
76
|
-
```ts
|
|
77
|
-
export type IXindexConfig = {
|
|
78
|
-
ignoreKeywords: string[];
|
|
79
|
-
ignoreFiles: string[];
|
|
80
|
-
maxSnippetLines: number;
|
|
81
|
-
maxSnippetResults: number;
|
|
82
|
-
};
|
|
83
|
-
```
|
|
84
|
-
All fields optional in the JSON file; defaults applied at load time.
|
|
85
|
-
|
|
86
|
-
- **1.2 Load config** — create `componets/config/loadConfig.ts` as HOF
|
|
87
|
-
```ts
|
|
88
|
-
export type ILoadConfig = () => Promise<IXindexConfig>;
|
|
89
|
-
export function LoadConfig({configPath, log}: {configPath: string, log: ILogger}): ILoadConfig
|
|
90
|
-
```
|
|
91
|
-
- Read `configPath` (`.xindex.json` in cwd)
|
|
92
|
-
- `JSON.parse`, apply defaults: `{ignoreKeywords: [], ignoreFiles: [], maxSnippetLines: 7, maxSnippetResults: 3}`
|
|
93
|
-
- If file missing or empty → return all defaults (no error)
|
|
94
|
-
- If JSON parse fails → throw with clear message including path
|
|
95
|
-
- Validate: use `log` to warn if any `ignoreKeywords` entry has length ≤ 1 (no `console.*` — project uses `ILogger`)
|
|
96
|
-
|
|
97
|
-
- **1.3 Wire into BuildComponents** — modify `componets/buildComponents.ts:6-22`
|
|
98
|
-
- Current: creates `embed`, `extractKeywords`, `cleanUpKeywords({maxNgrams: 2, minLength: 2})` then `ContentIndexDriver`
|
|
99
|
-
- `BuildComponents` currently takes no args. Add `{log}: {log: ILogger}` so `LoadConfig` can use it for warnings.
|
|
100
|
-
- Add: `const loadConfig = LoadConfig({configPath: ".xindex.json", log})` → `const config = await loadConfig()`
|
|
101
|
-
- Pass `config.ignoreKeywords` to `CleanUpKeywords`: `CleanUpKeywords({maxNgrams: 2, minLength: 2, ignoreKeywords: config.ignoreKeywords})`
|
|
102
|
-
- Return `config` in the output so callers can access snippet + file ignore settings
|
|
103
|
-
- `BuildComponents` return type gains `config: IXindexConfig`
|
|
104
|
-
- All callers need updating to pass `log` and destructure `config`:
|
|
105
|
-
- `apps/run.mcp.ts:19` — needs `config` for `McpApp` + `ignoreFiles` for `WalkFiles`/`WatchFiles`
|
|
106
|
-
- `apps/run.index.ts:11` — needs `config.ignoreFiles` for `WalkFiles`
|
|
107
|
-
- `apps/run.watch.ts:15` — needs `config.ignoreFiles` for `WalkFiles`/`WatchFiles`
|
|
108
|
-
- `apps/run.search.ts:8` — needs `config` for snippet settings
|
|
109
|
-
|
|
110
|
-
### 2. Keyword ignore list
|
|
111
|
-
|
|
112
|
-
- **2.1 Extend CleanUpKeywords** — modify `componets/keywords/cleanUpKeywords.ts:8`
|
|
113
|
-
- Current signature: `CleanUpKeywords({maxNgrams, minLength}: {maxNgrams: number, minLength: number})`
|
|
114
|
-
- New signature: `CleanUpKeywords({maxNgrams, minLength, ignoreKeywords = []}: {maxNgrams: number, minLength: number, ignoreKeywords?: string[]})`
|
|
115
|
-
- Build `ignoreSet = new Set(ignoreKeywords.map(k => k.toLowerCase()))` at factory time (once, not per call)
|
|
116
|
-
- Add `if (ignoreSet.has(lower)) return false;` into existing filter chain at line 21-27, before the `seen` dedup check
|
|
117
|
-
|
|
118
|
-
Exact change at `cleanUpKeywords.ts`:
|
|
119
|
-
```ts
|
|
120
|
-
export function CleanUpKeywords({maxNgrams, minLength, ignoreKeywords = []}: {
|
|
121
|
-
maxNgrams: number, minLength: number, ignoreKeywords?: string[]
|
|
122
|
-
}): ICleanUpKeywords {
|
|
123
|
-
const ignoreSet = new Set(ignoreKeywords.map(k => k.toLowerCase()));
|
|
124
|
-
return function cleanUpKeywords(keywords) {
|
|
125
|
-
// ... existing extraction ...
|
|
126
|
-
const seen = new Set<string>();
|
|
127
|
-
return extracted.filter((kw: string) => {
|
|
128
|
-
if (kw.length <= minLength || !/[a-z]/i.test(kw)) return false;
|
|
129
|
-
const lower = kw.toLowerCase();
|
|
130
|
-
if (ignoreSet.has(lower)) return false; // ← NEW
|
|
131
|
-
if (seen.has(lower)) return false;
|
|
132
|
-
seen.add(lower);
|
|
133
|
-
return true;
|
|
134
|
-
});
|
|
135
|
-
}
|
|
136
|
-
}
|
|
137
|
-
```
|
|
138
|
-
|
|
139
|
-
- **2.2 Propagation** — single change at `buildComponents.ts:9` propagates to all consumers:
|
|
140
|
-
- `ContentIndexDriver` (`contentIndexDriver.ts:28`) passes `cleanUpKeywords` to:
|
|
141
|
-
- `ClusterLines` (`clusterLines.ts:20`) — uses at `:34-35` (top/bot split keywords) and `:56` (leaf keywords)
|
|
142
|
-
- `IndexFileContent` (`indexFileContent.ts:10`) — uses at `:19` (file-level keywords)
|
|
143
|
-
- `SearchContentIndex` (`searchContentIndex.ts:12`) — uses at `:22` (query keywords)
|
|
144
|
-
- All paths share the same `cleanUpKeywords` instance. No additional wiring needed.
|
|
145
|
-
|
|
146
|
-
### 3. File ignore list
|
|
147
|
-
|
|
148
|
-
- **3.1 Extend WalkFiles** — modify `componets/walkFiles.ts:8`
|
|
149
|
-
- Current signature: `WalkFiles({cwd, log}: {cwd: string, log: ILogger})`
|
|
150
|
-
- New signature: `WalkFiles({cwd, log, ignoreFiles = []}: {cwd: string, log: ILogger, ignoreFiles?: string[]})`
|
|
151
|
-
- In `walk()` at line 18-22: the `ignore` instance `ig` is already constructed per-directory with accumulated `.gitignore` rules. Add `ignoreFiles` rules after existing rules:
|
|
152
|
-
```ts
|
|
153
|
-
const ig = ignore();
|
|
154
|
-
for (const rule of rules) ig.add(rule);
|
|
155
|
-
for (const pattern of ignoreFiles) ig.add(pattern); // ← NEW
|
|
156
|
-
```
|
|
157
|
-
- This makes `ignoreFiles` patterns behave identically to `.gitignore` entries — same glob syntax, same matching semantics (relative paths, directory trailing `/`, negation with `!`)
|
|
158
|
-
- The `ignore` package is already a dependency (`package.json:28`)
|
|
159
|
-
|
|
160
|
-
- **3.2 Extend WatchFiles** — modify `componets/watchFiles.ts:20`
|
|
161
|
-
- Current signature: `WatchFiles({cwd, log}: {cwd: string, log: ILogger})`
|
|
162
|
-
- New signature: `WatchFiles({cwd, log, ignoreFiles = []}: {cwd: string, log: ILogger, ignoreFiles?: string[]})`
|
|
163
|
-
- In `loadGitignore()` at line 22-31: creates its own `ignore()` instance per watched directory. Add `ignoreFiles` rules after `.gitignore` rules:
|
|
164
|
-
```ts
|
|
165
|
-
async function loadGitignore(dir: string) {
|
|
166
|
-
const ig = ignore();
|
|
167
|
-
ig.add(".*");
|
|
168
|
-
try {
|
|
169
|
-
const content = await readFile(join(dir, ".gitignore"), "utf8");
|
|
170
|
-
ig.add(content);
|
|
171
|
-
} catch {}
|
|
172
|
-
for (const pattern of ignoreFiles) ig.add(pattern); // ← NEW
|
|
173
|
-
return ig;
|
|
174
|
-
}
|
|
175
|
-
```
|
|
176
|
-
|
|
177
|
-
- **3.3 Wire ignoreFiles to all entry points** — pass `config.ignoreFiles` at each `WalkFiles`/`WatchFiles` construction:
|
|
178
|
-
- `apps/run.mcp.ts:18` — `WalkFiles({cwd, log})` → `WalkFiles({cwd, log, ignoreFiles: config.ignoreFiles})`
|
|
179
|
-
- `apps/run.mcp.ts:30` — `WatchFiles({cwd, log})` → `WatchFiles({cwd, log, ignoreFiles: config.ignoreFiles})`
|
|
180
|
-
- `apps/run.index.ts:10` — `WalkFiles({cwd, log})` → `WalkFiles({cwd, log, ignoreFiles: config.ignoreFiles})`
|
|
181
|
-
- `apps/run.watch.ts:13` — `WalkFiles({cwd, log})` → `WalkFiles({cwd, log, ignoreFiles: config.ignoreFiles})`
|
|
182
|
-
- `apps/run.watch.ts:14` — `WatchFiles({cwd, log})` → `WatchFiles({cwd, log, ignoreFiles: config.ignoreFiles})`
|
|
183
|
-
|
|
184
|
-
### 4. Inline code snippets
|
|
185
|
-
|
|
186
|
-
- **4.1 Add snippet params to MCP** — modify `apps/mcpApp.ts:34-58`
|
|
187
|
-
- `McpApp` factory gains `config: IXindexConfig` dependency (add to `mcpApp.ts:23` params)
|
|
188
|
-
- Extend `xindex_search` schema at line 37:
|
|
189
|
-
```ts
|
|
190
|
-
inputSchema: z.object({
|
|
191
|
-
query: z.string().describe("Natural language search query"),
|
|
192
|
-
limit: z.number().int().min(1).max(100).default(10)
|
|
193
|
-
.describe("Max results to return, 10 by default, 100 max"),
|
|
194
|
-
maxSnippetResults: z.number().int().min(0).max(20).optional()
|
|
195
|
-
.describe("How many top results include inline code (default from .xindex.json, 3)"),
|
|
196
|
-
maxSnippetLines: z.number().int().min(0).max(50).optional()
|
|
197
|
-
.describe("Max lines in a result to qualify for inline code (default from .xindex.json, 7)"),
|
|
198
|
-
}),
|
|
199
|
-
```
|
|
200
|
-
- In handler: resolve with config fallback:
|
|
201
|
-
```ts
|
|
202
|
-
const sr = maxSnippetResults ?? config.maxSnippetResults;
|
|
203
|
-
const sl = maxSnippetLines ?? config.maxSnippetLines;
|
|
204
|
-
```
|
|
205
|
-
- Update `apps/run.mcp.ts:48` — pass `config` to `McpApp`
|
|
206
|
-
|
|
207
|
-
- **4.2 Read source lines** — create `componets/index/readSnippet.ts`
|
|
208
|
-
```ts
|
|
209
|
-
export type IReadSnippet = (record: IIndexRecord, maxLines: number) => Promise<string | null>;
|
|
210
|
-
export function ReadSnippet(): IReadSnippet
|
|
211
|
-
```
|
|
212
|
-
Logic:
|
|
213
|
-
- **Cluster result** (`meta.type === StoreEntryType.cluster`): use `meta.fromLine`/`meta.toLine` directly from `IClusterMeta` (no need to parse from ID). Compute `lineCount = meta.toLine - meta.fromLine + 1`. If `lineCount > maxLines` → return `null`. Extract file path from `record.id` by splitting on last `:` (format `"path/to/file.ts:14-27"`). `readFile(filePath, "utf8")`, split lines, slice `[meta.fromLine-1, meta.toLine]`, return joined with `\n`.
|
|
214
|
-
- **File result** (`meta.type === StoreEntryType.meta`): `readFile(record.id, "utf8")`, split lines, count. If `lineCount > maxLines` → return `null`. Otherwise return full content.
|
|
215
|
-
- **Error handling**: on `readFile` failure (file deleted, moved, permission error) → return `null` silently. Search results still display; snippet is just omitted.
|
|
216
|
-
|
|
217
|
-
- **4.3 Format with code** — update result formatting in both entry points:
|
|
218
|
-
|
|
219
|
-
**MCP** (`apps/mcpApp.ts:47-51`): currently `results.map(...)` builds 1-line summaries. Replace with loop:
|
|
220
|
-
```ts
|
|
221
|
-
const readSnippet = ReadSnippet();
|
|
222
|
-
const lines: string[] = [];
|
|
223
|
-
for (let i = 0; i < results.length; i++) {
|
|
224
|
-
const r = results[i];
|
|
225
|
-
const kw = r.meta.keywords ? ` — ${r.meta.keywords}` : "";
|
|
226
|
-
lines.push(`${i + 1}. ${r.id} (${r.score.toFixed(2)})${kw}`);
|
|
227
|
-
if (i < sr) {
|
|
228
|
-
const snippet = await readSnippet(r, sl);
|
|
229
|
-
if (snippet) lines.push("```\n" + snippet + "\n```");
|
|
230
|
-
}
|
|
231
|
-
}
|
|
232
|
-
```
|
|
233
|
-
|
|
234
|
-
**CLI** (`apps/run.search.ts:8-31`): destructure config from `BuildComponents()`:
|
|
235
|
-
```ts
|
|
236
|
-
const {searchContentIndex, config} = await BuildComponents({log});
|
|
237
|
-
```
|
|
238
|
-
Then same snippet pattern using `config.maxSnippetResults` and `config.maxSnippetLines`:
|
|
239
|
-
```ts
|
|
240
|
-
const readSnippet = ReadSnippet();
|
|
241
|
-
// ... in the result loop after existing log lines:
|
|
242
|
-
if (i < config.maxSnippetResults) {
|
|
243
|
-
const snippet = await readSnippet(results[i], config.maxSnippetLines);
|
|
244
|
-
if (snippet) log("```\n" + snippet + "\n```");
|
|
245
|
-
}
|
|
246
|
-
```
|
|
247
|
-
|
|
248
|
-
## Files Changed
|
|
249
|
-
|
|
250
|
-
| File | Change |
|
|
251
|
-
|------|--------|
|
|
252
|
-
| `componets/config/xindexConfig.ts` | **NEW** — `IXindexConfig` type with 4 fields |
|
|
253
|
-
| `componets/config/loadConfig.ts` | **NEW** — `LoadConfig` HOF, reads `.xindex.json` (optional), applies defaults, warns via `ILogger` |
|
|
254
|
-
| `componets/keywords/cleanUpKeywords.ts` | Add `ignoreKeywords` param + `ignoreSet` filter |
|
|
255
|
-
| `componets/walkFiles.ts` | Add `ignoreFiles` param, feed into `ignore()` instances |
|
|
256
|
-
| `componets/watchFiles.ts` | Add `ignoreFiles` param, feed into `loadGitignore()` `ignore()` instance |
|
|
257
|
-
| `componets/buildComponents.ts` | Add `{log}` param, load config, pass `ignoreKeywords` to `CleanUpKeywords`, return `config` |
|
|
258
|
-
| `componets/index/readSnippet.ts` | **NEW** — `ReadSnippet` HOF, reads file lines for a search result |
|
|
259
|
-
| `apps/mcpApp.ts` | Add `config` dep, `maxSnippetResults`/`maxSnippetLines` schema params, snippet formatting |
|
|
260
|
-
| `apps/run.mcp.ts` | Pass `config` to `McpApp`, `ignoreFiles` to `WalkFiles`/`WatchFiles`, `log` to `BuildComponents` |
|
|
261
|
-
| `apps/run.search.ts` | Pass `log` to `BuildComponents`, use `config` for snippet formatting |
|
|
262
|
-
| `apps/run.index.ts` | Pass `log` to `BuildComponents`, `ignoreFiles` to `WalkFiles` |
|
|
263
|
-
| `apps/run.watch.ts` | Pass `log` to `BuildComponents`, `ignoreFiles` to `WalkFiles`/`WatchFiles` |
|
|
264
|
-
|
|
265
|
-
## Example `.xindex.json`
|
|
266
|
-
|
|
267
|
-
```json
|
|
268
|
-
{
|
|
269
|
-
"ignoreKeywords": ["import", "export", "const", "function", "return", "async", "await"],
|
|
270
|
-
"ignoreFiles": ["*.test.ts", "*.spec.ts", "rnd/**", "dist/**"],
|
|
271
|
-
"maxSnippetLines": 7,
|
|
272
|
-
"maxSnippetResults": 3
|
|
273
|
-
}
|
|
274
|
-
```
|
|
@@ -1,32 +0,0 @@
|
|
|
1
|
-
# Log: xindex-watch — Continuous Indexing with File Watcher
|
|
2
|
-
|
|
3
|
-
### 2026-04-10
|
|
4
|
-
|
|
5
|
-
- Task created from user notes: "file watcher → apply to indexer → xindex-index runs → indexes provided or cwd → watches for changes → created/updated/moved/deleted → queued to stream → index content"
|
|
6
|
-
- Scouted codebase: IndexApp already uses streamx pipeline (`from → tap → map → run`), WalkFiles is async generator, Writer supports push-based streaming, merge combines streams
|
|
7
|
-
- Key decision: use `node:fs/promises` `watch()` (recursive, async iterable) — no external dep needed
|
|
8
|
-
- Key decision: tagged union `{type:"index"|"remove", path}` as common event shape for walk + watch streams
|
|
9
|
-
- Key decision: merge initial walk stream + watch stream into single pipeline
|
|
10
|
-
- Debounce needed: editors fire multiple events per save (write temp → rename → delete old)
|
|
11
|
-
- RemoveContent needed: Vectra has `deleteItem()` but no HOF wrapper exists yet
|
|
12
|
-
- **Clarification round resolved:**
|
|
13
|
-
- Watch is always on (no optional flag) — index all, then watch. Default behavior.
|
|
14
|
-
- Default to cwd when no args
|
|
15
|
-
- Event→Vectra: created→add, updated→delete+add, deleted→delete, moved→delete(old)+add(new)
|
|
16
|
-
- Binary filtering: deferred (TODO), keep simple for now
|
|
17
|
-
- Graceful shutdown: SIGINT → stop processing → ignore queued → exit
|
|
18
|
-
- Watching individual files: works fine with fs.watch, no issue
|
|
19
|
-
- **Consistency check:** fixed 6 issues — removed optional watch flag, added graceful shutdown to diagram/steps, clarified update=delete+add semantics, marked binary filtering as TODO
|
|
20
|
-
- **User clarification:** separate entry points — `xindex-watch` (new, continuous) vs `xindex-index` (existing, one-time). Both default to cwd.
|
|
21
|
-
- **Design decision:** Vectra `upsertItem` handles both add and update — no need for delete+add on updates, just upsert
|
|
22
|
-
- **Design decision:** WatchApp is a new HOF in `apps/watchApp.ts`; IndexApp stays unchanged; no modifications to MCP/search paths
|
|
23
|
-
- **Design decision:** WatchFiles uses `Writer<FileEvent>` to push events into streamx-compatible stream; `stop()` closes watchers + finishes writer
|
|
24
|
-
- **Implementation pivot:** streamx `Writer`/`merge`/`batchTimed` depend on `@handy/fun` (not installed in xindex). Rewrote to use plain async generators instead — simpler, no new deps. Two-phase approach: walk+index first, then watch+process.
|
|
25
|
-
- **Debounce approach:** collect events in Map (keyed by path, last event wins), flush after 150ms quiet period. Replaces batchTimed.
|
|
26
|
-
- **Watcher uses AbortController** for clean shutdown — `fs.watch` accepts `signal` option natively.
|
|
27
|
-
- **All steps implemented and verified:**
|
|
28
|
-
- Step 1: RemoveContent HOF + wired into ContentIndexDriver + BuildComponents ✓
|
|
29
|
-
- Step 2: WatchFiles component with debounced async generator ✓
|
|
30
|
-
- Step 3: WatchApp with two-phase (walk then watch) ✓
|
|
31
|
-
- Step 4: run.watch.ts + bin/xindex-watch + package.json + run.index.ts default ✓
|
|
32
|
-
- **Tested:** initial index, file create detection, file delete detection, SIGINT graceful shutdown — all pass
|