@dreb/coding-agent 1.16.0 → 1.18.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +19 -9
- package/agents/code-reviewer.md +1 -1
- package/agents/completeness-checker.md +1 -1
- package/agents/error-auditor.md +1 -1
- package/agents/explore.md +1 -1
- package/agents/feature-dev.md +1 -1
- package/agents/independent-assessor.md +1 -1
- package/agents/simplifier.md +1 -1
- package/agents/test-reviewer.md +1 -1
- package/dist/core/search/embedder.d.ts +1 -0
- package/dist/core/search/embedder.d.ts.map +1 -1
- package/dist/core/search/embedder.js +38 -23
- package/dist/core/search/embedder.js.map +1 -1
- package/dist/core/search/scanner.d.ts.map +1 -1
- package/dist/core/search/scanner.js +9 -0
- package/dist/core/search/scanner.js.map +1 -1
- package/dist/core/search/search.d.ts +10 -1
- package/dist/core/search/search.d.ts.map +1 -1
- package/dist/core/search/search.js +141 -97
- package/dist/core/search/search.js.map +1 -1
- package/dist/core/system-prompt.d.ts.map +1 -1
- package/dist/core/system-prompt.js +7 -1
- package/dist/core/system-prompt.js.map +1 -1
- package/dist/core/tools/dreb-paths.d.ts +17 -0
- package/dist/core/tools/dreb-paths.d.ts.map +1 -0
- package/dist/core/tools/dreb-paths.js +43 -0
- package/dist/core/tools/dreb-paths.js.map +1 -0
- package/dist/core/tools/find.d.ts.map +1 -1
- package/dist/core/tools/find.js +8 -0
- package/dist/core/tools/find.js.map +1 -1
- package/dist/core/tools/grep.d.ts.map +1 -1
- package/dist/core/tools/grep.js +8 -0
- package/dist/core/tools/grep.js.map +1 -1
- package/dist/core/tools/search.d.ts +19 -1
- package/dist/core/tools/search.d.ts.map +1 -1
- package/dist/core/tools/search.js +38 -10
- package/dist/core/tools/search.js.map +1 -1
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -41,14 +41,8 @@ What you get in exchange: a skill system, an extension API, custom agent definit
|
|
|
41
41
|
|
|
42
42
|
## Quick Start
|
|
43
43
|
|
|
44
|
-
Clone and build:
|
|
45
|
-
|
|
46
44
|
```bash
|
|
47
|
-
|
|
48
|
-
cd dreb
|
|
49
|
-
npm install
|
|
50
|
-
npm run build
|
|
51
|
-
npm link -w packages/coding-agent
|
|
45
|
+
npm install -g @dreb/coding-agent
|
|
52
46
|
```
|
|
53
47
|
|
|
54
48
|
Authenticate with an API key:
|
|
@@ -69,10 +63,24 @@ Or use a custom provider (corporate proxy, Bedrock, etc.) — see [Custom provid
|
|
|
69
63
|
|
|
70
64
|
Then just talk to dreb. All 10 built-in tools are enabled by default: `read`, `write`, `edit`, `bash`, `grep`, `find`, `ls`, `web_search`, `web_fetch`, and `subagent`. Use `--tools` to restrict to a subset (e.g., `--tools read,grep,find,ls` for read-only). Three additional tools — `search`, `skill`, and `tasks_update` — are always active. The model uses these to fulfill your requests. Add capabilities via [skills](#skills), [prompt templates](#prompt-templates), [extensions](#extensions), or [packages](#packages).
|
|
71
65
|
|
|
66
|
+
**Also available:** [`@dreb/telegram`](https://www.npmjs.com/package/@dreb/telegram) — run dreb as a Telegram bot (`npm install -g @dreb/telegram`).
|
|
67
|
+
|
|
72
68
|
**Platform notes:** [Windows](docs/windows.md) | [Termux (Android)](docs/termux.md) | [tmux](docs/tmux.md) | [Terminal setup](docs/terminal-setup.md) | [Shell aliases](docs/shell-aliases.md)
|
|
73
69
|
|
|
74
70
|
---
|
|
75
71
|
|
|
72
|
+
### Building from source
|
|
73
|
+
|
|
74
|
+
```bash
|
|
75
|
+
git clone https://github.com/aebrer/dreb.git
|
|
76
|
+
cd dreb
|
|
77
|
+
npm install
|
|
78
|
+
npm run build
|
|
79
|
+
npm link -w packages/coding-agent
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
---
|
|
83
|
+
|
|
76
84
|
## Providers & Models
|
|
77
85
|
|
|
78
86
|
For each built-in provider, dreb maintains a list of tool-capable models, updated with every release. Authenticate via subscription (`/login`) or API key, then select any model from that provider via `/model` (or Ctrl+L).
|
|
@@ -325,7 +333,9 @@ Task tracking is prompt-driven: the system prompt includes guidelines for when t
|
|
|
325
333
|
|
|
326
334
|
The `search` tool provides natural language queries over the codebase using embeddings and full-text search. It supports identifier queries (e.g., `AuthMiddleware`), natural language (e.g., `where is rate limiting handled`), and path queries (e.g., `src/auth/`).
|
|
327
335
|
|
|
328
|
-
**
|
|
336
|
+
**Parameters:** `query` (required), `path` (restrict to subdirectory), `limit` (max results, default 20), `projectDir` (index/search a different directory instead of cwd — useful for Telegram sessions where cwd is `~/`), `rebuild` (force a clean re-index when results look stale or corrupt).
|
|
337
|
+
|
|
338
|
+
**How it works:** The first query builds a project index (typically 10–60s, longer for very large repos). Subsequent queries use the cached index, with incremental re-indexing for changed files (mtime-based). Each unique `projectDir` gets its own independent index.
|
|
329
339
|
|
|
330
340
|
**Indexing pipeline:**
|
|
331
341
|
- AST-aware code chunking via tree-sitter (TypeScript, JavaScript, Python, Go, Rust, Java, C, C++) — extracts functions, classes, methods, and exports as individual chunks
|
|
@@ -334,7 +344,7 @@ The `search` tool provides natural language queries over the codebase using embe
|
|
|
334
344
|
|
|
335
345
|
**Ranking:** Uses POEM (Pareto-Optimal Embedding-based Multiranking) with 6 metrics: FTS5 BM25, vector cosine similarity, path match, symbol match, import graph proximity, and git recency. Short identifier queries bias toward exact text matches; long natural language queries bias toward vector similarity.
|
|
336
346
|
|
|
337
|
-
**Storage:** Project index at `.dreb/index/`, memory files indexed alongside code. Works offline after the initial model download.
|
|
347
|
+
**Storage:** Project index at `.dreb/index/`, memory files indexed alongside code. Add `**/.dreb/` to your project's `.gitignore`. Works offline after the initial model download.
|
|
338
348
|
|
|
339
349
|
**Requirements:** Node.js 22+ (uses built-in `node:sqlite`). On older Node versions the tool is silently unavailable — no crash, it simply doesn't register. Zero native addons — uses `web-tree-sitter` (WASM) and `@huggingface/transformers` (WASM).
|
|
340
350
|
|
package/agents/code-reviewer.md
CHANGED
package/agents/error-auditor.md
CHANGED
package/agents/explore.md
CHANGED
package/agents/feature-dev.md
CHANGED
package/agents/simplifier.md
CHANGED
package/agents/test-reviewer.md
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"embedder.d.ts","sourceRoot":"","sources":["../../../src/core/search/embedder.ts"],"names":[],"mappings":"AAAA;;;;;;GAMG;AAEH,OAAO,KAAK,EAAE,qBAAqB,EAAE,MAAM,YAAY,CAAC;AAMxD,MAAM,WAAW,eAAe;IAC/B,+EAA+E;IAC/E,aAAa,EAAE,MAAM,CAAC;IACtB,kEAAkE;IAClE,SAAS,CAAC,EAAE,MAAM,CAAC;IACnB,uDAAuD;IACvD,SAAS,CAAC,EAAE,MAAM,CAAC;CACnB;AAsBD,qBAAa,QAAQ;IACpB,OAAO,CAAC,QAAQ,CAAC,aAAa,CAAS;IACvC,OAAO,CAAC,QAAQ,CAAC,SAAS,CAAS;IACnC,OAAO,CAAC,QAAQ,CAAC,SAAS,CAAS;IACnC,OAAO,CAAC,SAAS,CAAoB;IACrC,OAAO,CAAC,iBAAiB,CAAuB;IAEhD,YAAY,OAAO,EAAE,eAAe,EAInC;IAED;;;;;OAKG;IACG,UAAU,IAAI,OAAO,CAAC,IAAI,CAAC,
|
|
1
|
+
{"version":3,"file":"embedder.d.ts","sourceRoot":"","sources":["../../../src/core/search/embedder.ts"],"names":[],"mappings":"AAAA;;;;;;GAMG;AAEH,OAAO,KAAK,EAAE,qBAAqB,EAAE,MAAM,YAAY,CAAC;AAMxD,MAAM,WAAW,eAAe;IAC/B,+EAA+E;IAC/E,aAAa,EAAE,MAAM,CAAC;IACtB,kEAAkE;IAClE,SAAS,CAAC,EAAE,MAAM,CAAC;IACnB,uDAAuD;IACvD,SAAS,CAAC,EAAE,MAAM,CAAC;CACnB;AAsBD,qBAAa,QAAQ;IACpB,OAAO,CAAC,QAAQ,CAAC,aAAa,CAAS;IACvC,OAAO,CAAC,QAAQ,CAAC,SAAS,CAAS;IACnC,OAAO,CAAC,QAAQ,CAAC,SAAS,CAAS;IACnC,OAAO,CAAC,SAAS,CAAoB;IACrC,OAAO,CAAC,WAAW,CAA8B;IACjD,OAAO,CAAC,iBAAiB,CAAuB;IAEhD,YAAY,OAAO,EAAE,eAAe,EAInC;IAED;;;;;OAKG;IACG,UAAU,IAAI,OAAO,CAAC,IAAI,CAAC,CAuChC;IAED;;;;;OAKG;IACG,cAAc,CAAC,KAAK,EAAE,MAAM,EAAE,EAAE,UAAU,CAAC,EAAE,qBAAqB,GAAG,OAAO,CAAC,YAAY,EAAE,CAAC,CAiCjG;IAED;;;;OAIG;IACG,UAAU,CAAC,KAAK,EAAE,MAAM,GAAG,OAAO,CAAC,YAAY,CAAC,CAgBrD;IAED,uGAAuG;IACvG,IAAI,SAAS,IAAI,MAAM,CAEtB;IAED,2CAA2C;IAC3C,OAAO,IAAI,IAAI,CASd;IAED,oDAAoD;IACpD,OAAO,CAAC,iBAAiB;CAKzB","sourcesContent":["/**\n * Embedding pipeline using @huggingface/transformers.\n *\n * Generates normalized embeddings for semantic search. The embedding\n * dimension is derived at runtime from the model's actual output.\n * First use downloads the model (~23MB) to the configured cache directory.\n */\n\nimport type { IndexProgressCallback } from \"./types.js\";\n\n// ============================================================================\n// Types\n// ============================================================================\n\nexport interface EmbedderOptions {\n\t/** Absolute path to the model cache directory (e.g. ~/.dreb/agent/models/). */\n\tmodelCacheDir: string;\n\t/** HuggingFace model name. Default: 'Xenova/all-MiniLM-L6-v2'. */\n\tmodelName?: string;\n\t/** Number of texts to embed per batch. Default: 32. */\n\tbatchSize?: number;\n}\n\n// ============================================================================\n// Constants\n// ============================================================================\n\nconst DEFAULT_MODEL_NAME = \"Xenova/all-MiniLM-L6-v2\";\nconst DEFAULT_BATCH_SIZE = 32;\nconst DEFAULT_DIMENSION = 384;\n\n/**\n * Model-specific prefixes for document vs query embeddings.\n * nomic-embed-text-v1.5 requires these; most other models don't.\n */\nconst MODEL_PREFIXES: Record<string, { document: string; query: string }> = {\n\t\"nomic-ai/nomic-embed-text-v1.5\": { document: \"search_document: \", query: \"search_query: \" },\n};\n\n// ============================================================================\n// Embedder\n// ============================================================================\n\nexport class Embedder {\n\tprivate readonly modelCacheDir: string;\n\tprivate readonly modelName: string;\n\tprivate readonly batchSize: number;\n\tprivate extractor: any | null = null;\n\tprivate initPromise: Promise<void> | null = null;\n\tprivate resolvedDimension: number | null = null;\n\n\tconstructor(options: EmbedderOptions) {\n\t\tthis.modelCacheDir = options.modelCacheDir;\n\t\tthis.modelName = options.modelName ?? DEFAULT_MODEL_NAME;\n\t\tthis.batchSize = options.batchSize ?? DEFAULT_BATCH_SIZE;\n\t}\n\n\t/**\n\t * Initialize the model pipeline. Must be called before embedding.\n\t *\n\t * On first use this downloads the ONNX model to `modelCacheDir`.\n\t * Subsequent calls reuse the cached model.\n\t */\n\tasync initialize(): Promise<void> {\n\t\tif (this.extractor) return;\n\t\tif (this.initPromise) return this.initPromise;\n\n\t\tthis.initPromise = (async () => {\n\t\t\ttry {\n\t\t\t\t// Dynamic import — @huggingface/transformers is a heavy dependency\n\t\t\t\tconst { pipeline, env } = await import(\"@huggingface/transformers\");\n\n\t\t\t\t// Direct the model cache to our managed directory\n\t\t\t\tenv.cacheDir = this.modelCacheDir;\n\n\t\t\t\t// Suppress the onnxruntime native addon warning — WASM fallback is fine\n\t\t\t\t// The library tries to load native onnxruntime first and logs a warning\n\t\t\t\t// when it falls back to WASM. We suppress this to avoid confusing users.\n\t\t\t\tconst originalWarn = console.warn;\n\t\t\t\tconsole.warn = (...args: any[]) => {\n\t\t\t\t\tconst msg = typeof args[0] === \"string\" ? args[0] : \"\";\n\t\t\t\t\tif (msg.includes(\"onnxruntime\") || msg.includes(\"ONNX\")) return;\n\t\t\t\t\toriginalWarn.apply(console, args);\n\t\t\t\t};\n\n\t\t\t\ttry {\n\t\t\t\t\tthis.extractor = await pipeline(\"feature-extraction\", this.modelName, {\n\t\t\t\t\t\tdtype: \"q8\" as any,\n\t\t\t\t\t\tdevice: \"cpu\" as any,\n\t\t\t\t\t});\n\t\t\t\t} finally {\n\t\t\t\t\tconsole.warn = originalWarn;\n\t\t\t\t}\n\t\t\t} catch (err) {\n\t\t\t\t// Reset so subsequent calls can retry instead of returning\n\t\t\t\t// the same rejected promise forever\n\t\t\t\tthis.initPromise = null;\n\t\t\t\tthrow err;\n\t\t\t}\n\t\t})();\n\n\t\treturn this.initPromise;\n\t}\n\n\t/**\n\t * Embed documents for indexing.\n\t *\n\t * Applies model-specific prefixes if required, then processes texts\n\t * in batches of `batchSize` for memory efficiency.\n\t */\n\tasync embedDocuments(texts: string[], onProgress?: IndexProgressCallback): Promise<Float32Array[]> {\n\t\tthis.ensureInitialized();\n\n\t\tconst results: Float32Array[] = [];\n\t\tconst total = texts.length;\n\t\tconst prefix = MODEL_PREFIXES[this.modelName]?.document ?? \"\";\n\n\t\tfor (let i = 0; i < total; i += this.batchSize) {\n\t\t\tconst batch = texts.slice(i, i + this.batchSize);\n\t\t\tconst prefixed = prefix ? batch.map((t) => prefix + t) : batch;\n\n\t\t\tconst output = await this.extractor!(prefixed, {\n\t\t\t\tpooling: \"mean\",\n\t\t\t\tnormalize: true,\n\t\t\t});\n\n\t\t\t// output.data is a flat Float32Array of shape [batchLen, dim]\n\t\t\tconst data: Float32Array = output.data;\n\t\t\tconst dim = data.length / batch.length;\n\t\t\tif (this.resolvedDimension === null) {\n\t\t\t\tthis.resolvedDimension = dim;\n\t\t\t}\n\t\t\tfor (let j = 0; j < batch.length; j++) {\n\t\t\t\tconst start = j * dim;\n\t\t\t\tresults.push(data.slice(start, start + dim));\n\t\t\t}\n\n\t\t\tif (onProgress) {\n\t\t\t\tonProgress(\"embedding\", Math.min(i + batch.length, total), total);\n\t\t\t}\n\t\t}\n\n\t\treturn results;\n\t}\n\n\t/**\n\t * Embed a query for search.\n\t *\n\t * Applies model-specific query prefix if required.\n\t */\n\tasync embedQuery(query: string): Promise<Float32Array> {\n\t\tthis.ensureInitialized();\n\n\t\tconst prefix = MODEL_PREFIXES[this.modelName]?.query ?? \"\";\n\t\tconst output = await this.extractor!(prefix + query, {\n\t\t\tpooling: \"mean\",\n\t\t\tnormalize: true,\n\t\t});\n\n\t\t// Single input — derive dimension and slice to exactly one vector\n\t\tconst data: Float32Array = output.data;\n\t\tconst dim = data.length;\n\t\tif (this.resolvedDimension === null) {\n\t\t\tthis.resolvedDimension = dim;\n\t\t}\n\t\treturn data.slice(0, dim);\n\t}\n\n\t/** Get the embedding dimension. Returns the model's actual dimension once known, or 384 as default. */\n\tget dimension(): number {\n\t\treturn this.resolvedDimension ?? DEFAULT_DIMENSION;\n\t}\n\n\t/** Dispose the pipeline to free memory. */\n\tdispose(): void {\n\t\tif (this.extractor) {\n\t\t\t// The pipeline object may have a dispose method depending on the version\n\t\t\tif (typeof this.extractor.dispose === \"function\") {\n\t\t\t\tthis.extractor.dispose();\n\t\t\t}\n\t\t\tthis.extractor = null;\n\t\t}\n\t\tthis.initPromise = null;\n\t}\n\n\t/** Throw if initialize() hasn't been called yet. */\n\tprivate ensureInitialized(): void {\n\t\tif (!this.extractor) {\n\t\t\tthrow new Error(\"Embedder not initialized. Call initialize() first.\");\n\t\t}\n\t}\n}\n"]}
|
|
@@ -26,6 +26,7 @@ export class Embedder {
|
|
|
26
26
|
modelName;
|
|
27
27
|
batchSize;
|
|
28
28
|
extractor = null;
|
|
29
|
+
initPromise = null;
|
|
29
30
|
resolvedDimension = null;
|
|
30
31
|
constructor(options) {
|
|
31
32
|
this.modelCacheDir = options.modelCacheDir;
|
|
@@ -41,29 +42,42 @@ export class Embedder {
|
|
|
41
42
|
async initialize() {
|
|
42
43
|
if (this.extractor)
|
|
43
44
|
return;
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
|
|
45
|
+
if (this.initPromise)
|
|
46
|
+
return this.initPromise;
|
|
47
|
+
this.initPromise = (async () => {
|
|
48
|
+
try {
|
|
49
|
+
// Dynamic import — @huggingface/transformers is a heavy dependency
|
|
50
|
+
const { pipeline, env } = await import("@huggingface/transformers");
|
|
51
|
+
// Direct the model cache to our managed directory
|
|
52
|
+
env.cacheDir = this.modelCacheDir;
|
|
53
|
+
// Suppress the onnxruntime native addon warning — WASM fallback is fine
|
|
54
|
+
// The library tries to load native onnxruntime first and logs a warning
|
|
55
|
+
// when it falls back to WASM. We suppress this to avoid confusing users.
|
|
56
|
+
const originalWarn = console.warn;
|
|
57
|
+
console.warn = (...args) => {
|
|
58
|
+
const msg = typeof args[0] === "string" ? args[0] : "";
|
|
59
|
+
if (msg.includes("onnxruntime") || msg.includes("ONNX"))
|
|
60
|
+
return;
|
|
61
|
+
originalWarn.apply(console, args);
|
|
62
|
+
};
|
|
63
|
+
try {
|
|
64
|
+
this.extractor = await pipeline("feature-extraction", this.modelName, {
|
|
65
|
+
dtype: "q8",
|
|
66
|
+
device: "cpu",
|
|
67
|
+
});
|
|
68
|
+
}
|
|
69
|
+
finally {
|
|
70
|
+
console.warn = originalWarn;
|
|
71
|
+
}
|
|
72
|
+
}
|
|
73
|
+
catch (err) {
|
|
74
|
+
// Reset so subsequent calls can retry instead of returning
|
|
75
|
+
// the same rejected promise forever
|
|
76
|
+
this.initPromise = null;
|
|
77
|
+
throw err;
|
|
78
|
+
}
|
|
79
|
+
})();
|
|
80
|
+
return this.initPromise;
|
|
67
81
|
}
|
|
68
82
|
/**
|
|
69
83
|
* Embed documents for indexing.
|
|
@@ -132,6 +146,7 @@ export class Embedder {
|
|
|
132
146
|
}
|
|
133
147
|
this.extractor = null;
|
|
134
148
|
}
|
|
149
|
+
this.initPromise = null;
|
|
135
150
|
}
|
|
136
151
|
/** Throw if initialize() hasn't been called yet. */
|
|
137
152
|
ensureInitialized() {
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"embedder.js","sourceRoot":"","sources":["../../../src/core/search/embedder.ts"],"names":[],"mappings":"AAAA;;;;;;GAMG;AAiBH,+EAA+E;AAC/E,YAAY;AACZ,+EAA+E;AAE/E,MAAM,kBAAkB,GAAG,yBAAyB,CAAC;AACrD,MAAM,kBAAkB,GAAG,EAAE,CAAC;AAC9B,MAAM,iBAAiB,GAAG,GAAG,CAAC;AAE9B;;;GAGG;AACH,MAAM,cAAc,GAAwD;IAC3E,gCAAgC,EAAE,EAAE,QAAQ,EAAE,mBAAmB,EAAE,KAAK,EAAE,gBAAgB,EAAE;CAC5F,CAAC;AAEF,+EAA+E;AAC/E,WAAW;AACX,+EAA+E;AAE/E,MAAM,OAAO,QAAQ;IACH,aAAa,CAAS;IACtB,SAAS,CAAS;IAClB,SAAS,CAAS;IAC3B,SAAS,GAAe,IAAI,CAAC;IAC7B,iBAAiB,GAAkB,IAAI,CAAC;IAEhD,YAAY,OAAwB,EAAE;QACrC,IAAI,CAAC,aAAa,GAAG,OAAO,CAAC,aAAa,CAAC;QAC3C,IAAI,CAAC,SAAS,GAAG,OAAO,CAAC,SAAS,IAAI,kBAAkB,CAAC;QACzD,IAAI,CAAC,SAAS,GAAG,OAAO,CAAC,SAAS,IAAI,kBAAkB,CAAC;IAAA,CACzD;IAED;;;;;OAKG;IACH,KAAK,CAAC,UAAU,GAAkB;QACjC,IAAI,IAAI,CAAC,SAAS;YAAE,OAAO;QAE3B,qEAAmE;QACnE,MAAM,EAAE,QAAQ,EAAE,GAAG,EAAE,GAAG,MAAM,MAAM,CAAC,2BAA2B,CAAC,CAAC;QAEpE,kDAAkD;QAClD,GAAG,CAAC,QAAQ,GAAG,IAAI,CAAC,aAAa,CAAC;QAElC,0EAAwE;QACxE,wEAAwE;QACxE,yEAAyE;QACzE,MAAM,YAAY,GAAG,OAAO,CAAC,IAAI,CAAC;QAClC,OAAO,CAAC,IAAI,GAAG,CAAC,GAAG,IAAW,EAAE,EAAE,CAAC;YAClC,MAAM,GAAG,GAAG,OAAO,IAAI,CAAC,CAAC,CAAC,KAAK,QAAQ,CAAC,CAAC,CAAC,IAAI,CAAC,CAAC,CAAC,CAAC,CAAC,CAAC,EAAE,CAAC;YACvD,IAAI,GAAG,CAAC,QAAQ,CAAC,aAAa,CAAC,IAAI,GAAG,CAAC,QAAQ,CAAC,MAAM,CAAC;gBAAE,OAAO;YAChE,YAAY,CAAC,KAAK,CAAC,OAAO,EAAE,IAAI,CAAC,CAAC;QAAA,CAClC,CAAC;QAEF,IAAI,CAAC;YACJ,IAAI,CAAC,SAAS,GAAG,MAAM,QAAQ,CAAC,oBAAoB,EAAE,IAAI,CAAC,SAAS,EAAE;gBACrE,KAAK,EAAE,IAAW;gBAClB,MAAM,EAAE,KAAY;aACpB,CAAC,CAAC;QACJ,CAAC;gBAAS,CAAC;YACV,OAAO,CAAC,IAAI,GAAG,YAAY,CAAC;QAC7B,CAAC;IAAA,CACD;IAED;;;;;OAKG;IACH,KAAK,CAAC,cAAc,CAAC,KAAe,EAAE,UAAkC,EAA2B;QAClG,IAAI,CAAC,iBAAiB,EAAE,CAAC;QAEzB,MAAM,OAAO,GAAmB,EAAE,CAAC;QACnC,MAAM,KAAK,GAAG,KAAK,CAAC,MAAM,CAAC;QAC3B,MAAM,MAAM,GAAG,cAAc,CAAC,IAAI,CAAC,SAAS,CAAC,EAAE,QAAQ,IAAI,EAAE,CAAC;QAE9D,KAAK,IAAI,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,KAAK,EAAE,CAAC,IAAI,IAAI,CAAC,SAAS,EAAE,CAAC;YAChD,MAAM,KAAK,GAAG,KAAK,CAAC,KAAK,CAAC,CAAC,EAAE,CAAC,GAAG,IAAI,CAAC,SAAS,CAAC,CAAC;YACjD,MAAM,QAAQ,GAAG,MAAM,CAAC,CAAC,CAAC,KAAK,CAAC,GAAG,CAAC,CAAC,CAAC,EAAE,EAAE,CAAC,MAAM,GAAG,CAAC,CAAC,CAAC,CAAC,CAAC,KAAK,CAAC;YAE/D,MAAM,MAAM,GAAG,MAAM,IAAI,CAAC,SAAU,CAAC,QAAQ,EAAE;gBAC9C,OAAO,EAAE,MAAM;gBACf,SAAS,EAAE,IAAI;aACf,CAAC,CAAC;YAEH,8DAA8D;YAC9D,MAAM,IAAI,GAAiB,MAAM,CAAC,IAAI,CAAC;YACvC,MAAM,GAAG,GAAG,IAAI,CAAC,MAAM,GAAG,KAAK,CAAC,MAAM,CAAC;YACvC,IAAI,IAAI,CAAC,iBAAiB,KAAK,IAAI,EAAE,CAAC;gBACrC,IAAI,CAAC,iBAAiB,GAAG,GAAG,CAAC;YAC9B,CAAC;YACD,KAAK,IAAI,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,KAAK,CAAC,MAAM,EAAE,CAAC,EAAE,EAAE,CAAC;gBACvC,MAAM,KAAK,GAAG,CAAC,GAAG,GAAG,CAAC;gBACtB,OAAO,CAAC,IAAI,CAAC,IAAI,CAAC,KAAK,CAAC,KAAK,EAAE,KAAK,GAAG,GAAG,CAAC,CAAC,CAAC;YAC9C,CAAC;YAED,IAAI,UAAU,EAAE,CAAC;gBAChB,UAAU,CAAC,WAAW,EAAE,IAAI,CAAC,GAAG,CAAC,CAAC,GAAG,KAAK,CAAC,MAAM,EAAE,KAAK,CAAC,EAAE,KAAK,CAAC,CAAC;YACnE,CAAC;QACF,CAAC;QAED,OAAO,OAAO,CAAC;IAAA,CACf;IAED;;;;OAIG;IACH,KAAK,CAAC,UAAU,CAAC,KAAa,EAAyB;QACtD,IAAI,CAAC,iBAAiB,EAAE,CAAC;QAEzB,MAAM,MAAM,GAAG,cAAc,CAAC,IAAI,CAAC,SAAS,CAAC,EAAE,KAAK,IAAI,EAAE,CAAC;QAC3D,MAAM,MAAM,GAAG,MAAM,IAAI,CAAC,SAAU,CAAC,MAAM,GAAG,KAAK,EAAE;YACpD,OAAO,EAAE,MAAM;YACf,SAAS,EAAE,IAAI;SACf,CAAC,CAAC;QAEH,oEAAkE;QAClE,MAAM,IAAI,GAAiB,MAAM,CAAC,IAAI,CAAC;QACvC,MAAM,GAAG,GAAG,IAAI,CAAC,MAAM,CAAC;QACxB,IAAI,IAAI,CAAC,iBAAiB,KAAK,IAAI,EAAE,CAAC;YACrC,IAAI,CAAC,iBAAiB,GAAG,GAAG,CAAC;QAC9B,CAAC;QACD,OAAO,IAAI,CAAC,KAAK,CAAC,CAAC,EAAE,GAAG,CAAC,CAAC;IAAA,CAC1B;IAED,uGAAuG;IACvG,IAAI,SAAS,GAAW;QACvB,OAAO,IAAI,CAAC,iBAAiB,IAAI,iBAAiB,CAAC;IAAA,CACnD;IAED,2CAA2C;IAC3C,OAAO,GAAS;QACf,IAAI,IAAI,CAAC,SAAS,EAAE,CAAC;YACpB,yEAAyE;YACzE,IAAI,OAAO,IAAI,CAAC,SAAS,CAAC,OAAO,KAAK,UAAU,EAAE,CAAC;gBAClD,IAAI,CAAC,SAAS,CAAC,OAAO,EAAE,CAAC;YAC1B,CAAC;YACD,IAAI,CAAC,SAAS,GAAG,IAAI,CAAC;QACvB,CAAC;IAAA,CACD;IAED,oDAAoD;IAC5C,iBAAiB,GAAS;QACjC,IAAI,CAAC,IAAI,CAAC,SAAS,EAAE,CAAC;YACrB,MAAM,IAAI,KAAK,CAAC,oDAAoD,CAAC,CAAC;QACvE,CAAC;IAAA,CACD;CACD","sourcesContent":["/**\n * Embedding pipeline using @huggingface/transformers.\n *\n * Generates normalized embeddings for semantic search. The embedding\n * dimension is derived at runtime from the model's actual output.\n * First use downloads the model (~23MB) to the configured cache directory.\n */\n\nimport type { IndexProgressCallback } from \"./types.js\";\n\n// ============================================================================\n// Types\n// ============================================================================\n\nexport interface EmbedderOptions {\n\t/** Absolute path to the model cache directory (e.g. ~/.dreb/agent/models/). */\n\tmodelCacheDir: string;\n\t/** HuggingFace model name. Default: 'Xenova/all-MiniLM-L6-v2'. */\n\tmodelName?: string;\n\t/** Number of texts to embed per batch. Default: 32. */\n\tbatchSize?: number;\n}\n\n// ============================================================================\n// Constants\n// ============================================================================\n\nconst DEFAULT_MODEL_NAME = \"Xenova/all-MiniLM-L6-v2\";\nconst DEFAULT_BATCH_SIZE = 32;\nconst DEFAULT_DIMENSION = 384;\n\n/**\n * Model-specific prefixes for document vs query embeddings.\n * nomic-embed-text-v1.5 requires these; most other models don't.\n */\nconst MODEL_PREFIXES: Record<string, { document: string; query: string }> = {\n\t\"nomic-ai/nomic-embed-text-v1.5\": { document: \"search_document: \", query: \"search_query: \" },\n};\n\n// ============================================================================\n// Embedder\n// ============================================================================\n\nexport class Embedder {\n\tprivate readonly modelCacheDir: string;\n\tprivate readonly modelName: string;\n\tprivate readonly batchSize: number;\n\tprivate extractor: any | null = null;\n\tprivate resolvedDimension: number | null = null;\n\n\tconstructor(options: EmbedderOptions) {\n\t\tthis.modelCacheDir = options.modelCacheDir;\n\t\tthis.modelName = options.modelName ?? DEFAULT_MODEL_NAME;\n\t\tthis.batchSize = options.batchSize ?? DEFAULT_BATCH_SIZE;\n\t}\n\n\t/**\n\t * Initialize the model pipeline. Must be called before embedding.\n\t *\n\t * On first use this downloads the ONNX model to `modelCacheDir`.\n\t * Subsequent calls reuse the cached model.\n\t */\n\tasync initialize(): Promise<void> {\n\t\tif (this.extractor) return;\n\n\t\t// Dynamic import — @huggingface/transformers is a heavy dependency\n\t\tconst { pipeline, env } = await import(\"@huggingface/transformers\");\n\n\t\t// Direct the model cache to our managed directory\n\t\tenv.cacheDir = this.modelCacheDir;\n\n\t\t// Suppress the onnxruntime native addon warning — WASM fallback is fine\n\t\t// The library tries to load native onnxruntime first and logs a warning\n\t\t// when it falls back to WASM. We suppress this to avoid confusing users.\n\t\tconst originalWarn = console.warn;\n\t\tconsole.warn = (...args: any[]) => {\n\t\t\tconst msg = typeof args[0] === \"string\" ? args[0] : \"\";\n\t\t\tif (msg.includes(\"onnxruntime\") || msg.includes(\"ONNX\")) return;\n\t\t\toriginalWarn.apply(console, args);\n\t\t};\n\n\t\ttry {\n\t\t\tthis.extractor = await pipeline(\"feature-extraction\", this.modelName, {\n\t\t\t\tdtype: \"q8\" as any,\n\t\t\t\tdevice: \"cpu\" as any,\n\t\t\t});\n\t\t} finally {\n\t\t\tconsole.warn = originalWarn;\n\t\t}\n\t}\n\n\t/**\n\t * Embed documents for indexing.\n\t *\n\t * Applies model-specific prefixes if required, then processes texts\n\t * in batches of `batchSize` for memory efficiency.\n\t */\n\tasync embedDocuments(texts: string[], onProgress?: IndexProgressCallback): Promise<Float32Array[]> {\n\t\tthis.ensureInitialized();\n\n\t\tconst results: Float32Array[] = [];\n\t\tconst total = texts.length;\n\t\tconst prefix = MODEL_PREFIXES[this.modelName]?.document ?? \"\";\n\n\t\tfor (let i = 0; i < total; i += this.batchSize) {\n\t\t\tconst batch = texts.slice(i, i + this.batchSize);\n\t\t\tconst prefixed = prefix ? batch.map((t) => prefix + t) : batch;\n\n\t\t\tconst output = await this.extractor!(prefixed, {\n\t\t\t\tpooling: \"mean\",\n\t\t\t\tnormalize: true,\n\t\t\t});\n\n\t\t\t// output.data is a flat Float32Array of shape [batchLen, dim]\n\t\t\tconst data: Float32Array = output.data;\n\t\t\tconst dim = data.length / batch.length;\n\t\t\tif (this.resolvedDimension === null) {\n\t\t\t\tthis.resolvedDimension = dim;\n\t\t\t}\n\t\t\tfor (let j = 0; j < batch.length; j++) {\n\t\t\t\tconst start = j * dim;\n\t\t\t\tresults.push(data.slice(start, start + dim));\n\t\t\t}\n\n\t\t\tif (onProgress) {\n\t\t\t\tonProgress(\"embedding\", Math.min(i + batch.length, total), total);\n\t\t\t}\n\t\t}\n\n\t\treturn results;\n\t}\n\n\t/**\n\t * Embed a query for search.\n\t *\n\t * Applies model-specific query prefix if required.\n\t */\n\tasync embedQuery(query: string): Promise<Float32Array> {\n\t\tthis.ensureInitialized();\n\n\t\tconst prefix = MODEL_PREFIXES[this.modelName]?.query ?? \"\";\n\t\tconst output = await this.extractor!(prefix + query, {\n\t\t\tpooling: \"mean\",\n\t\t\tnormalize: true,\n\t\t});\n\n\t\t// Single input — derive dimension and slice to exactly one vector\n\t\tconst data: Float32Array = output.data;\n\t\tconst dim = data.length;\n\t\tif (this.resolvedDimension === null) {\n\t\t\tthis.resolvedDimension = dim;\n\t\t}\n\t\treturn data.slice(0, dim);\n\t}\n\n\t/** Get the embedding dimension. Returns the model's actual dimension once known, or 384 as default. */\n\tget dimension(): number {\n\t\treturn this.resolvedDimension ?? DEFAULT_DIMENSION;\n\t}\n\n\t/** Dispose the pipeline to free memory. */\n\tdispose(): void {\n\t\tif (this.extractor) {\n\t\t\t// The pipeline object may have a dispose method depending on the version\n\t\t\tif (typeof this.extractor.dispose === \"function\") {\n\t\t\t\tthis.extractor.dispose();\n\t\t\t}\n\t\t\tthis.extractor = null;\n\t\t}\n\t}\n\n\t/** Throw if initialize() hasn't been called yet. */\n\tprivate ensureInitialized(): void {\n\t\tif (!this.extractor) {\n\t\t\tthrow new Error(\"Embedder not initialized. Call initialize() first.\");\n\t\t}\n\t}\n}\n"]}
|
|
1
|
+
{"version":3,"file":"embedder.js","sourceRoot":"","sources":["../../../src/core/search/embedder.ts"],"names":[],"mappings":"AAAA;;;;;;GAMG;AAiBH,+EAA+E;AAC/E,YAAY;AACZ,+EAA+E;AAE/E,MAAM,kBAAkB,GAAG,yBAAyB,CAAC;AACrD,MAAM,kBAAkB,GAAG,EAAE,CAAC;AAC9B,MAAM,iBAAiB,GAAG,GAAG,CAAC;AAE9B;;;GAGG;AACH,MAAM,cAAc,GAAwD;IAC3E,gCAAgC,EAAE,EAAE,QAAQ,EAAE,mBAAmB,EAAE,KAAK,EAAE,gBAAgB,EAAE;CAC5F,CAAC;AAEF,+EAA+E;AAC/E,WAAW;AACX,+EAA+E;AAE/E,MAAM,OAAO,QAAQ;IACH,aAAa,CAAS;IACtB,SAAS,CAAS;IAClB,SAAS,CAAS;IAC3B,SAAS,GAAe,IAAI,CAAC;IAC7B,WAAW,GAAyB,IAAI,CAAC;IACzC,iBAAiB,GAAkB,IAAI,CAAC;IAEhD,YAAY,OAAwB,EAAE;QACrC,IAAI,CAAC,aAAa,GAAG,OAAO,CAAC,aAAa,CAAC;QAC3C,IAAI,CAAC,SAAS,GAAG,OAAO,CAAC,SAAS,IAAI,kBAAkB,CAAC;QACzD,IAAI,CAAC,SAAS,GAAG,OAAO,CAAC,SAAS,IAAI,kBAAkB,CAAC;IAAA,CACzD;IAED;;;;;OAKG;IACH,KAAK,CAAC,UAAU,GAAkB;QACjC,IAAI,IAAI,CAAC,SAAS;YAAE,OAAO;QAC3B,IAAI,IAAI,CAAC,WAAW;YAAE,OAAO,IAAI,CAAC,WAAW,CAAC;QAE9C,IAAI,CAAC,WAAW,GAAG,CAAC,KAAK,IAAI,EAAE,CAAC;YAC/B,IAAI,CAAC;gBACJ,qEAAmE;gBACnE,MAAM,EAAE,QAAQ,EAAE,GAAG,EAAE,GAAG,MAAM,MAAM,CAAC,2BAA2B,CAAC,CAAC;gBAEpE,kDAAkD;gBAClD,GAAG,CAAC,QAAQ,GAAG,IAAI,CAAC,aAAa,CAAC;gBAElC,0EAAwE;gBACxE,wEAAwE;gBACxE,yEAAyE;gBACzE,MAAM,YAAY,GAAG,OAAO,CAAC,IAAI,CAAC;gBAClC,OAAO,CAAC,IAAI,GAAG,CAAC,GAAG,IAAW,EAAE,EAAE,CAAC;oBAClC,MAAM,GAAG,GAAG,OAAO,IAAI,CAAC,CAAC,CAAC,KAAK,QAAQ,CAAC,CAAC,CAAC,IAAI,CAAC,CAAC,CAAC,CAAC,CAAC,CAAC,EAAE,CAAC;oBACvD,IAAI,GAAG,CAAC,QAAQ,CAAC,aAAa,CAAC,IAAI,GAAG,CAAC,QAAQ,CAAC,MAAM,CAAC;wBAAE,OAAO;oBAChE,YAAY,CAAC,KAAK,CAAC,OAAO,EAAE,IAAI,CAAC,CAAC;gBAAA,CAClC,CAAC;gBAEF,IAAI,CAAC;oBACJ,IAAI,CAAC,SAAS,GAAG,MAAM,QAAQ,CAAC,oBAAoB,EAAE,IAAI,CAAC,SAAS,EAAE;wBACrE,KAAK,EAAE,IAAW;wBAClB,MAAM,EAAE,KAAY;qBACpB,CAAC,CAAC;gBACJ,CAAC;wBAAS,CAAC;oBACV,OAAO,CAAC,IAAI,GAAG,YAAY,CAAC;gBAC7B,CAAC;YACF,CAAC;YAAC,OAAO,GAAG,EAAE,CAAC;gBACd,2DAA2D;gBAC3D,oCAAoC;gBACpC,IAAI,CAAC,WAAW,GAAG,IAAI,CAAC;gBACxB,MAAM,GAAG,CAAC;YACX,CAAC;QAAA,CACD,CAAC,EAAE,CAAC;QAEL,OAAO,IAAI,CAAC,WAAW,CAAC;IAAA,CACxB;IAED;;;;;OAKG;IACH,KAAK,CAAC,cAAc,CAAC,KAAe,EAAE,UAAkC,EAA2B;QAClG,IAAI,CAAC,iBAAiB,EAAE,CAAC;QAEzB,MAAM,OAAO,GAAmB,EAAE,CAAC;QACnC,MAAM,KAAK,GAAG,KAAK,CAAC,MAAM,CAAC;QAC3B,MAAM,MAAM,GAAG,cAAc,CAAC,IAAI,CAAC,SAAS,CAAC,EAAE,QAAQ,IAAI,EAAE,CAAC;QAE9D,KAAK,IAAI,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,KAAK,EAAE,CAAC,IAAI,IAAI,CAAC,SAAS,EAAE,CAAC;YAChD,MAAM,KAAK,GAAG,KAAK,CAAC,KAAK,CAAC,CAAC,EAAE,CAAC,GAAG,IAAI,CAAC,SAAS,CAAC,CAAC;YACjD,MAAM,QAAQ,GAAG,MAAM,CAAC,CAAC,CAAC,KAAK,CAAC,GAAG,CAAC,CAAC,CAAC,EAAE,EAAE,CAAC,MAAM,GAAG,CAAC,CAAC,CAAC,CAAC,CAAC,KAAK,CAAC;YAE/D,MAAM,MAAM,GAAG,MAAM,IAAI,CAAC,SAAU,CAAC,QAAQ,EAAE;gBAC9C,OAAO,EAAE,MAAM;gBACf,SAAS,EAAE,IAAI;aACf,CAAC,CAAC;YAEH,8DAA8D;YAC9D,MAAM,IAAI,GAAiB,MAAM,CAAC,IAAI,CAAC;YACvC,MAAM,GAAG,GAAG,IAAI,CAAC,MAAM,GAAG,KAAK,CAAC,MAAM,CAAC;YACvC,IAAI,IAAI,CAAC,iBAAiB,KAAK,IAAI,EAAE,CAAC;gBACrC,IAAI,CAAC,iBAAiB,GAAG,GAAG,CAAC;YAC9B,CAAC;YACD,KAAK,IAAI,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,KAAK,CAAC,MAAM,EAAE,CAAC,EAAE,EAAE,CAAC;gBACvC,MAAM,KAAK,GAAG,CAAC,GAAG,GAAG,CAAC;gBACtB,OAAO,CAAC,IAAI,CAAC,IAAI,CAAC,KAAK,CAAC,KAAK,EAAE,KAAK,GAAG,GAAG,CAAC,CAAC,CAAC;YAC9C,CAAC;YAED,IAAI,UAAU,EAAE,CAAC;gBAChB,UAAU,CAAC,WAAW,EAAE,IAAI,CAAC,GAAG,CAAC,CAAC,GAAG,KAAK,CAAC,MAAM,EAAE,KAAK,CAAC,EAAE,KAAK,CAAC,CAAC;YACnE,CAAC;QACF,CAAC;QAED,OAAO,OAAO,CAAC;IAAA,CACf;IAED;;;;OAIG;IACH,KAAK,CAAC,UAAU,CAAC,KAAa,EAAyB;QACtD,IAAI,CAAC,iBAAiB,EAAE,CAAC;QAEzB,MAAM,MAAM,GAAG,cAAc,CAAC,IAAI,CAAC,SAAS,CAAC,EAAE,KAAK,IAAI,EAAE,CAAC;QAC3D,MAAM,MAAM,GAAG,MAAM,IAAI,CAAC,SAAU,CAAC,MAAM,GAAG,KAAK,EAAE;YACpD,OAAO,EAAE,MAAM;YACf,SAAS,EAAE,IAAI;SACf,CAAC,CAAC;QAEH,oEAAkE;QAClE,MAAM,IAAI,GAAiB,MAAM,CAAC,IAAI,CAAC;QACvC,MAAM,GAAG,GAAG,IAAI,CAAC,MAAM,CAAC;QACxB,IAAI,IAAI,CAAC,iBAAiB,KAAK,IAAI,EAAE,CAAC;YACrC,IAAI,CAAC,iBAAiB,GAAG,GAAG,CAAC;QAC9B,CAAC;QACD,OAAO,IAAI,CAAC,KAAK,CAAC,CAAC,EAAE,GAAG,CAAC,CAAC;IAAA,CAC1B;IAED,uGAAuG;IACvG,IAAI,SAAS,GAAW;QACvB,OAAO,IAAI,CAAC,iBAAiB,IAAI,iBAAiB,CAAC;IAAA,CACnD;IAED,2CAA2C;IAC3C,OAAO,GAAS;QACf,IAAI,IAAI,CAAC,SAAS,EAAE,CAAC;YACpB,yEAAyE;YACzE,IAAI,OAAO,IAAI,CAAC,SAAS,CAAC,OAAO,KAAK,UAAU,EAAE,CAAC;gBAClD,IAAI,CAAC,SAAS,CAAC,OAAO,EAAE,CAAC;YAC1B,CAAC;YACD,IAAI,CAAC,SAAS,GAAG,IAAI,CAAC;QACvB,CAAC;QACD,IAAI,CAAC,WAAW,GAAG,IAAI,CAAC;IAAA,CACxB;IAED,oDAAoD;IAC5C,iBAAiB,GAAS;QACjC,IAAI,CAAC,IAAI,CAAC,SAAS,EAAE,CAAC;YACrB,MAAM,IAAI,KAAK,CAAC,oDAAoD,CAAC,CAAC;QACvE,CAAC;IAAA,CACD;CACD","sourcesContent":["/**\n * Embedding pipeline using @huggingface/transformers.\n *\n * Generates normalized embeddings for semantic search. The embedding\n * dimension is derived at runtime from the model's actual output.\n * First use downloads the model (~23MB) to the configured cache directory.\n */\n\nimport type { IndexProgressCallback } from \"./types.js\";\n\n// ============================================================================\n// Types\n// ============================================================================\n\nexport interface EmbedderOptions {\n\t/** Absolute path to the model cache directory (e.g. ~/.dreb/agent/models/). */\n\tmodelCacheDir: string;\n\t/** HuggingFace model name. Default: 'Xenova/all-MiniLM-L6-v2'. */\n\tmodelName?: string;\n\t/** Number of texts to embed per batch. Default: 32. */\n\tbatchSize?: number;\n}\n\n// ============================================================================\n// Constants\n// ============================================================================\n\nconst DEFAULT_MODEL_NAME = \"Xenova/all-MiniLM-L6-v2\";\nconst DEFAULT_BATCH_SIZE = 32;\nconst DEFAULT_DIMENSION = 384;\n\n/**\n * Model-specific prefixes for document vs query embeddings.\n * nomic-embed-text-v1.5 requires these; most other models don't.\n */\nconst MODEL_PREFIXES: Record<string, { document: string; query: string }> = {\n\t\"nomic-ai/nomic-embed-text-v1.5\": { document: \"search_document: \", query: \"search_query: \" },\n};\n\n// ============================================================================\n// Embedder\n// ============================================================================\n\nexport class Embedder {\n\tprivate readonly modelCacheDir: string;\n\tprivate readonly modelName: string;\n\tprivate readonly batchSize: number;\n\tprivate extractor: any | null = null;\n\tprivate initPromise: Promise<void> | null = null;\n\tprivate resolvedDimension: number | null = null;\n\n\tconstructor(options: EmbedderOptions) {\n\t\tthis.modelCacheDir = options.modelCacheDir;\n\t\tthis.modelName = options.modelName ?? DEFAULT_MODEL_NAME;\n\t\tthis.batchSize = options.batchSize ?? DEFAULT_BATCH_SIZE;\n\t}\n\n\t/**\n\t * Initialize the model pipeline. Must be called before embedding.\n\t *\n\t * On first use this downloads the ONNX model to `modelCacheDir`.\n\t * Subsequent calls reuse the cached model.\n\t */\n\tasync initialize(): Promise<void> {\n\t\tif (this.extractor) return;\n\t\tif (this.initPromise) return this.initPromise;\n\n\t\tthis.initPromise = (async () => {\n\t\t\ttry {\n\t\t\t\t// Dynamic import — @huggingface/transformers is a heavy dependency\n\t\t\t\tconst { pipeline, env } = await import(\"@huggingface/transformers\");\n\n\t\t\t\t// Direct the model cache to our managed directory\n\t\t\t\tenv.cacheDir = this.modelCacheDir;\n\n\t\t\t\t// Suppress the onnxruntime native addon warning — WASM fallback is fine\n\t\t\t\t// The library tries to load native onnxruntime first and logs a warning\n\t\t\t\t// when it falls back to WASM. We suppress this to avoid confusing users.\n\t\t\t\tconst originalWarn = console.warn;\n\t\t\t\tconsole.warn = (...args: any[]) => {\n\t\t\t\t\tconst msg = typeof args[0] === \"string\" ? args[0] : \"\";\n\t\t\t\t\tif (msg.includes(\"onnxruntime\") || msg.includes(\"ONNX\")) return;\n\t\t\t\t\toriginalWarn.apply(console, args);\n\t\t\t\t};\n\n\t\t\t\ttry {\n\t\t\t\t\tthis.extractor = await pipeline(\"feature-extraction\", this.modelName, {\n\t\t\t\t\t\tdtype: \"q8\" as any,\n\t\t\t\t\t\tdevice: \"cpu\" as any,\n\t\t\t\t\t});\n\t\t\t\t} finally {\n\t\t\t\t\tconsole.warn = originalWarn;\n\t\t\t\t}\n\t\t\t} catch (err) {\n\t\t\t\t// Reset so subsequent calls can retry instead of returning\n\t\t\t\t// the same rejected promise forever\n\t\t\t\tthis.initPromise = null;\n\t\t\t\tthrow err;\n\t\t\t}\n\t\t})();\n\n\t\treturn this.initPromise;\n\t}\n\n\t/**\n\t * Embed documents for indexing.\n\t *\n\t * Applies model-specific prefixes if required, then processes texts\n\t * in batches of `batchSize` for memory efficiency.\n\t */\n\tasync embedDocuments(texts: string[], onProgress?: IndexProgressCallback): Promise<Float32Array[]> {\n\t\tthis.ensureInitialized();\n\n\t\tconst results: Float32Array[] = [];\n\t\tconst total = texts.length;\n\t\tconst prefix = MODEL_PREFIXES[this.modelName]?.document ?? \"\";\n\n\t\tfor (let i = 0; i < total; i += this.batchSize) {\n\t\t\tconst batch = texts.slice(i, i + this.batchSize);\n\t\t\tconst prefixed = prefix ? batch.map((t) => prefix + t) : batch;\n\n\t\t\tconst output = await this.extractor!(prefixed, {\n\t\t\t\tpooling: \"mean\",\n\t\t\t\tnormalize: true,\n\t\t\t});\n\n\t\t\t// output.data is a flat Float32Array of shape [batchLen, dim]\n\t\t\tconst data: Float32Array = output.data;\n\t\t\tconst dim = data.length / batch.length;\n\t\t\tif (this.resolvedDimension === null) {\n\t\t\t\tthis.resolvedDimension = dim;\n\t\t\t}\n\t\t\tfor (let j = 0; j < batch.length; j++) {\n\t\t\t\tconst start = j * dim;\n\t\t\t\tresults.push(data.slice(start, start + dim));\n\t\t\t}\n\n\t\t\tif (onProgress) {\n\t\t\t\tonProgress(\"embedding\", Math.min(i + batch.length, total), total);\n\t\t\t}\n\t\t}\n\n\t\treturn results;\n\t}\n\n\t/**\n\t * Embed a query for search.\n\t *\n\t * Applies model-specific query prefix if required.\n\t */\n\tasync embedQuery(query: string): Promise<Float32Array> {\n\t\tthis.ensureInitialized();\n\n\t\tconst prefix = MODEL_PREFIXES[this.modelName]?.query ?? \"\";\n\t\tconst output = await this.extractor!(prefix + query, {\n\t\t\tpooling: \"mean\",\n\t\t\tnormalize: true,\n\t\t});\n\n\t\t// Single input — derive dimension and slice to exactly one vector\n\t\tconst data: Float32Array = output.data;\n\t\tconst dim = data.length;\n\t\tif (this.resolvedDimension === null) {\n\t\t\tthis.resolvedDimension = dim;\n\t\t}\n\t\treturn data.slice(0, dim);\n\t}\n\n\t/** Get the embedding dimension. Returns the model's actual dimension once known, or 384 as default. */\n\tget dimension(): number {\n\t\treturn this.resolvedDimension ?? DEFAULT_DIMENSION;\n\t}\n\n\t/** Dispose the pipeline to free memory. */\n\tdispose(): void {\n\t\tif (this.extractor) {\n\t\t\t// The pipeline object may have a dispose method depending on the version\n\t\t\tif (typeof this.extractor.dispose === \"function\") {\n\t\t\t\tthis.extractor.dispose();\n\t\t\t}\n\t\t\tthis.extractor = null;\n\t\t}\n\t\tthis.initPromise = null;\n\t}\n\n\t/** Throw if initialize() hasn't been called yet. */\n\tprivate ensureInitialized(): void {\n\t\tif (!this.extractor) {\n\t\t\tthrow new Error(\"Embedder not initialized. Call initialize() first.\");\n\t\t}\n\t}\n}\n"]}
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"scanner.d.ts","sourceRoot":"","sources":["../../../src/core/search/scanner.ts"],"names":[],"mappings":"AAAA;;;;;GAKG;AAMH,OAAO,KAAK,EAAE,QAAQ,EAAE,MAAM,YAAY,CAAC;AAM3C,4DAA4D;AAC5D,MAAM,WAAW,WAAW;IAC3B,4DAA4D;IAC5D,QAAQ,EAAE,MAAM,CAAC;IACjB,0BAA0B;IAC1B,QAAQ,EAAE,QAAQ,CAAC;IACnB,0DAA0D;IAC1D,KAAK,EAAE,MAAM,CAAC;CACd;AAiED;;;GAGG;AACH,wBAAgB,cAAc,CAAC,QAAQ,EAAE,MAAM,GAAG,QAAQ,GAAG,IAAI,CAIhE;AAED;;;;;;GAMG;AACH,wBAAsB,WAAW,CAAC,WAAW,EAAE,MAAM,EAAE,eAAe,CAAC,EAAE,MAAM,GAAG,OAAO,CAAC,WAAW,EAAE,CAAC,CAuBvG","sourcesContent":["/**\n * File scanner for the semantic search subsystem.\n *\n * Discovers project files for indexing by walking the directory tree,\n * respecting .gitignore rules, and classifying files by type.\n */\n\nimport { existsSync, readdirSync, readFileSync, type Stats, statSync } from \"node:fs\";\nimport { homedir } from \"node:os\";\nimport { extname, isAbsolute, join, relative, sep } from \"node:path\";\nimport ignore from \"ignore\";\nimport type { FileType } from \"./types.js\";\n\n// ============================================================================\n// Public types\n// ============================================================================\n\n/** A file discovered by the scanner, ready for indexing. */\nexport interface ScannedFile {\n\t/** Path relative to the project root (posix separators). */\n\tfilePath: string;\n\t/** Detected file type. */\n\tfileType: FileType;\n\t/** File modification time in milliseconds since epoch. */\n\tmtime: number;\n}\n\n// ============================================================================\n// Constants\n// ============================================================================\n\n/** Maximum file size to index (1 MB). */\nconst MAX_FILE_SIZE = 1024 * 1024;\n\n/** Directories unconditionally skipped during traversal. */\nconst SKIP_DIRS = new Set([\n\t\"node_modules\",\n\t\".git\",\n\t\".dreb/index\",\n\t\".hg\",\n\t\".svn\",\n\t\"__pycache__\",\n\t\".tox\",\n\t\".venv\",\n\t\"dist\",\n\t\"build\",\n\t\".next\",\n\t\".nuxt\",\n\t\"coverage\",\n\t\".cache\",\n]);\n\n/** Extension → FileType mapping. */\nconst EXTENSION_MAP: ReadonlyMap<string, FileType> = new Map<string, FileType>([\n\t// Tree-sitter languages\n\t[\".ts\", \"typescript\"],\n\t[\".tsx\", \"tsx\"],\n\t[\".js\", \"javascript\"],\n\t[\".mjs\", \"javascript\"],\n\t[\".cjs\", \"javascript\"],\n\t[\".py\", \"python\"],\n\t[\".go\", \"go\"],\n\t[\".rs\", \"rust\"],\n\t[\".java\", \"java\"],\n\t[\".c\", \"c\"],\n\t[\".h\", \"c\"],\n\t[\".cpp\", \"cpp\"],\n\t[\".hpp\", \"cpp\"],\n\t[\".cc\", \"cpp\"],\n\t[\".cxx\", \"cpp\"],\n\t[\".hh\", \"cpp\"],\n\t[\".hxx\", \"cpp\"],\n\t// Text file types\n\t[\".md\", \"markdown\"],\n\t[\".mdx\", \"markdown\"],\n\t[\".yml\", \"yaml\"],\n\t[\".yaml\", \"yaml\"],\n\t[\".json\", \"json\"],\n\t[\".toml\", \"toml\"],\n\t[\".txt\", \"plaintext\"],\n\t[\".cfg\", \"plaintext\"],\n\t[\".ini\", \"plaintext\"],\n\t[\".env\", \"plaintext\"],\n\t[\".conf\", \"plaintext\"],\n]);\n\n// ============================================================================\n// Public API\n// ============================================================================\n\n/**\n * Detect the {@link FileType} for a file path based on its extension.\n * Returns `null` for unrecognized extensions or files without an extension.\n */\nexport function detectFileType(filePath: string): FileType | null {\n\tconst ext = extname(filePath).toLowerCase();\n\tif (!ext) return null;\n\treturn EXTENSION_MAP.get(ext) ?? null;\n}\n\n/**\n * Scan a project directory and return all indexable files.\n *\n * Walks the tree rooted at {@link projectRoot}, respects `.gitignore` rules,\n * skips binary / oversized files, and optionally includes memory files from\n * a global memory directory.\n */\nexport async function scanProject(projectRoot: string, globalMemoryDir?: string): Promise<ScannedFile[]> {\n\tconst results: ScannedFile[] = [];\n\n\t// Detect if projectRoot is the home directory — use shallow scan mode\n\t// to avoid recursing into the entire home dir (which would be catastrophic).\n\tconst isHomeDir = isHomeDirPath(projectRoot);\n\n\tif (isHomeDir) {\n\t\t// Shallow mode: only scan top-level files and ~/.dreb/memory/\n\t\tscanShallow(projectRoot, results);\n\t} else {\n\t\t// Normal mode: full recursive walk with .gitignore\n\t\tconst ig = ignore();\n\t\tloadGitignore(ig, projectRoot, projectRoot);\n\t\twalkDirectory(projectRoot, projectRoot, ig, results);\n\t}\n\n\t// Include global memory files if the directory exists\n\tif (globalMemoryDir && existsSync(globalMemoryDir)) {\n\t\tscanMemoryDir(globalMemoryDir, projectRoot, results);\n\t}\n\n\treturn results;\n}\n\n/** Check if a path is the user's home directory. */\nfunction isHomeDirPath(dir: string): boolean {\n\ttry {\n\t\tconst home = homedir();\n\t\t// Normalize trailing slashes for comparison\n\t\tconst normalizedDir = dir.replace(/[/\\\\]+$/, \"\");\n\t\tconst normalizedHome = home.replace(/[/\\\\]+$/, \"\");\n\t\treturn normalizedDir === normalizedHome;\n\t} catch {\n\t\treturn false;\n\t}\n}\n\n/**\n * Shallow scan mode for home directory: only index top-level files\n * (no directory recursion) to avoid scanning the entire home directory.\n * Memory files are handled separately via scanMemoryDir.\n */\nfunction scanShallow(dir: string, results: ScannedFile[]): void {\n\tlet entries: string[];\n\ttry {\n\t\tentries = readdirSync(dir);\n\t} catch {\n\t\treturn;\n\t}\n\n\tfor (const entry of entries) {\n\t\t// Skip dotfiles/dotdirs in home dir (except specific ones we want)\n\t\tif (entry.startsWith(\".\")) continue;\n\n\t\tconst fullPath = join(dir, entry);\n\n\t\tlet stats: Stats;\n\t\ttry {\n\t\t\tstats = statSync(fullPath);\n\t\t} catch {\n\t\t\tcontinue;\n\t\t}\n\n\t\t// Only index files, not directories (shallow mode)\n\t\tif (!stats.isFile()) continue;\n\t\tif (stats.size > MAX_FILE_SIZE) continue;\n\t\tif (stats.size === 0) continue;\n\n\t\tconst fileType = detectFileType(entry);\n\t\tif (!fileType) continue;\n\n\t\tresults.push({\n\t\t\tfilePath: entry,\n\t\t\tfileType,\n\t\t\tmtime: stats.mtimeMs,\n\t\t});\n\t}\n}\n\n// ============================================================================\n// Internal helpers\n// ============================================================================\n\ntype IgnoreMatcher = ReturnType<typeof ignore>;\n\n/** Convert an OS path to posix separators for ignore matching. */\nfunction toPosix(p: string): string {\n\treturn p.split(sep).join(\"/\");\n}\n\n/** Load .gitignore rules from a directory into the ignore matcher. */\nfunction loadGitignore(ig: IgnoreMatcher, dir: string, root: string): void {\n\tconst gitignorePath = join(dir, \".gitignore\");\n\tif (!existsSync(gitignorePath)) return;\n\n\ttry {\n\t\tconst content = readFileSync(gitignorePath, \"utf-8\");\n\t\tconst relDir = relative(root, dir);\n\t\tconst prefix = relDir ? `${toPosix(relDir)}/` : \"\";\n\n\t\tconst patterns = content\n\t\t\t.split(/\\r?\\n/)\n\t\t\t.map((line) => prefixPattern(line, prefix))\n\t\t\t.filter((line): line is string => line !== null);\n\n\t\tif (patterns.length > 0) {\n\t\t\tig.add(patterns);\n\t\t}\n\t} catch {\n\t\t// Unreadable .gitignore — skip silently\n\t}\n}\n\n/**\n * Prefix a .gitignore pattern with a directory path so it applies\n * correctly when matching against root-relative paths.\n */\nfunction prefixPattern(line: string, prefix: string): string | null {\n\tconst trimmed = line.trim();\n\tif (!trimmed) return null;\n\tif (trimmed.startsWith(\"#\") && !trimmed.startsWith(\"\\\\#\")) return null;\n\n\tlet pattern = line;\n\tlet negated = false;\n\n\tif (pattern.startsWith(\"!\")) {\n\t\tnegated = true;\n\t\tpattern = pattern.slice(1);\n\t} else if (pattern.startsWith(\"\\\\!\")) {\n\t\tpattern = pattern.slice(1);\n\t}\n\n\tconst prefixed = prefix ? `${prefix}${pattern}` : pattern;\n\treturn negated ? `!${prefixed}` : prefixed;\n}\n\n/**\n * Check if a directory component (relative to root) should be unconditionally skipped.\n * Handles both top-level names (\"node_modules\") and nested paths (\".dreb/index\").\n */\nfunction shouldSkipDir(relPath: string): boolean {\n\tconst posix = toPosix(relPath);\n\n\t// Check the directory name itself\n\tconst parts = posix.split(\"/\");\n\tconst name = parts[parts.length - 1];\n\tif (SKIP_DIRS.has(name)) return true;\n\n\t// Check multi-segment skip patterns (e.g. \".dreb/index\")\n\tfor (const skip of SKIP_DIRS) {\n\t\tif (skip.includes(\"/\") && (posix === skip || posix.endsWith(`/${skip}`))) {\n\t\t\treturn true;\n\t\t}\n\t}\n\n\treturn false;\n}\n\n/** Recursively walk a directory, collecting indexable files. */\nfunction walkDirectory(dir: string, root: string, ig: IgnoreMatcher, results: ScannedFile[]): void {\n\tlet entries: string[];\n\ttry {\n\t\tentries = readdirSync(dir);\n\t} catch {\n\t\treturn; // Permission denied, etc.\n\t}\n\n\tfor (const entry of entries) {\n\t\tconst fullPath = join(dir, entry);\n\t\tconst relPath = relative(root, fullPath);\n\t\tconst posixRel = toPosix(relPath);\n\n\t\tlet stats: Stats;\n\t\ttry {\n\t\t\tstats = statSync(fullPath);\n\t\t} catch {\n\t\t\tcontinue; // Broken symlink, etc.\n\t\t}\n\n\t\tif (stats.isDirectory()) {\n\t\t\t// Hard-coded skip list\n\t\t\tif (shouldSkipDir(relPath)) continue;\n\n\t\t\t// .gitignore check (directories need trailing slash)\n\t\t\tif (ig.ignores(`${posixRel}/`)) continue;\n\n\t\t\t// Load nested .gitignore before descending\n\t\t\tloadGitignore(ig, fullPath, root);\n\n\t\t\twalkDirectory(fullPath, root, ig, results);\n\t\t\tcontinue;\n\t\t}\n\n\t\tif (!stats.isFile()) continue;\n\n\t\t// .gitignore check for files\n\t\tif (ig.ignores(posixRel)) continue;\n\n\t\t// Size gate\n\t\tif (stats.size > MAX_FILE_SIZE) continue;\n\t\tif (stats.size === 0) continue;\n\n\t\t// File type detection\n\t\tconst fileType = detectFileType(entry);\n\t\tif (!fileType) continue;\n\n\t\tresults.push({\n\t\t\tfilePath: posixRel,\n\t\t\tfileType,\n\t\t\tmtime: stats.mtimeMs,\n\t\t});\n\t}\n}\n\n/**\n * Scan a memory directory (project or global) for indexable files.\n *\n * Memory directories are always fully included — no .gitignore filtering —\n * because they live outside the normal project tree or in `.dreb/` which\n * is typically gitignored.\n *\n * Paths for global memory files are stored with a `~memory/` prefix\n * to distinguish them from project files.\n */\nfunction scanMemoryDir(memoryDir: string, projectRoot: string, results: ScannedFile[], baseMemoryDir?: string): void {\n\tlet entries: string[];\n\ttry {\n\t\tentries = readdirSync(memoryDir);\n\t} catch {\n\t\treturn;\n\t}\n\n\tfor (const entry of entries) {\n\t\tconst fullPath = join(memoryDir, entry);\n\n\t\tlet stats: Stats;\n\t\ttry {\n\t\t\tstats = statSync(fullPath);\n\t\t} catch {\n\t\t\tcontinue;\n\t\t}\n\n\t\tif (stats.isDirectory()) {\n\t\t\t// Recurse into subdirectories\n\t\t\tscanMemoryDir(fullPath, projectRoot, results, baseMemoryDir ?? memoryDir);\n\t\t\tcontinue;\n\t\t}\n\n\t\tif (!stats.isFile()) continue;\n\t\tif (stats.size > MAX_FILE_SIZE) continue;\n\t\tif (stats.size === 0) continue;\n\n\t\tconst fileType = detectFileType(entry);\n\t\tif (!fileType) continue;\n\n\t\t// If the memory dir is inside the project root, use normal relative path.\n\t\t// Otherwise, use a ~memory/ prefix so paths remain unique and identifiable.\n\t\tconst rel = relative(projectRoot, fullPath);\n\t\tconst isOutsideProject = rel.startsWith(\"..\") || isAbsolute(rel);\n\t\tconst rootMemoryDir = baseMemoryDir ?? memoryDir;\n\t\tconst filePath = isOutsideProject ? `~memory/${relative(rootMemoryDir, fullPath)}` : rel;\n\n\t\tresults.push({\n\t\t\tfilePath: toPosix(filePath),\n\t\t\tfileType,\n\t\t\tmtime: stats.mtimeMs,\n\t\t});\n\t}\n}\n"]}
|
|
1
|
+
{"version":3,"file":"scanner.d.ts","sourceRoot":"","sources":["../../../src/core/search/scanner.ts"],"names":[],"mappings":"AAAA;;;;;GAKG;AAOH,OAAO,KAAK,EAAE,QAAQ,EAAE,MAAM,YAAY,CAAC;AAM3C,4DAA4D;AAC5D,MAAM,WAAW,WAAW;IAC3B,4DAA4D;IAC5D,QAAQ,EAAE,MAAM,CAAC;IACjB,0BAA0B;IAC1B,QAAQ,EAAE,QAAQ,CAAC;IACnB,0DAA0D;IAC1D,KAAK,EAAE,MAAM,CAAC;CACd;AAiED;;;GAGG;AACH,wBAAgB,cAAc,CAAC,QAAQ,EAAE,MAAM,GAAG,QAAQ,GAAG,IAAI,CAIhE;AAED;;;;;;GAMG;AACH,wBAAsB,WAAW,CAAC,WAAW,EAAE,MAAM,EAAE,eAAe,CAAC,EAAE,MAAM,GAAG,OAAO,CAAC,WAAW,EAAE,CAAC,CAgCvG","sourcesContent":["/**\n * File scanner for the semantic search subsystem.\n *\n * Discovers project files for indexing by walking the directory tree,\n * respecting .gitignore rules, and classifying files by type.\n */\n\nimport { existsSync, readdirSync, readFileSync, type Stats, statSync } from \"node:fs\";\nimport { homedir } from \"node:os\";\nimport { extname, isAbsolute, join, relative, sep } from \"node:path\";\nimport ignore from \"ignore\";\nimport { getDrebToolVisibleDirs } from \"../tools/dreb-paths.js\";\nimport type { FileType } from \"./types.js\";\n\n// ============================================================================\n// Public types\n// ============================================================================\n\n/** A file discovered by the scanner, ready for indexing. */\nexport interface ScannedFile {\n\t/** Path relative to the project root (posix separators). */\n\tfilePath: string;\n\t/** Detected file type. */\n\tfileType: FileType;\n\t/** File modification time in milliseconds since epoch. */\n\tmtime: number;\n}\n\n// ============================================================================\n// Constants\n// ============================================================================\n\n/** Maximum file size to index (1 MB). */\nconst MAX_FILE_SIZE = 1024 * 1024;\n\n/** Directories unconditionally skipped during traversal. */\nconst SKIP_DIRS = new Set([\n\t\"node_modules\",\n\t\".git\",\n\t\".dreb/index\",\n\t\".hg\",\n\t\".svn\",\n\t\"__pycache__\",\n\t\".tox\",\n\t\".venv\",\n\t\"dist\",\n\t\"build\",\n\t\".next\",\n\t\".nuxt\",\n\t\"coverage\",\n\t\".cache\",\n]);\n\n/** Extension → FileType mapping. */\nconst EXTENSION_MAP: ReadonlyMap<string, FileType> = new Map<string, FileType>([\n\t// Tree-sitter languages\n\t[\".ts\", \"typescript\"],\n\t[\".tsx\", \"tsx\"],\n\t[\".js\", \"javascript\"],\n\t[\".mjs\", \"javascript\"],\n\t[\".cjs\", \"javascript\"],\n\t[\".py\", \"python\"],\n\t[\".go\", \"go\"],\n\t[\".rs\", \"rust\"],\n\t[\".java\", \"java\"],\n\t[\".c\", \"c\"],\n\t[\".h\", \"c\"],\n\t[\".cpp\", \"cpp\"],\n\t[\".hpp\", \"cpp\"],\n\t[\".cc\", \"cpp\"],\n\t[\".cxx\", \"cpp\"],\n\t[\".hh\", \"cpp\"],\n\t[\".hxx\", \"cpp\"],\n\t// Text file types\n\t[\".md\", \"markdown\"],\n\t[\".mdx\", \"markdown\"],\n\t[\".yml\", \"yaml\"],\n\t[\".yaml\", \"yaml\"],\n\t[\".json\", \"json\"],\n\t[\".toml\", \"toml\"],\n\t[\".txt\", \"plaintext\"],\n\t[\".cfg\", \"plaintext\"],\n\t[\".ini\", \"plaintext\"],\n\t[\".env\", \"plaintext\"],\n\t[\".conf\", \"plaintext\"],\n]);\n\n// ============================================================================\n// Public API\n// ============================================================================\n\n/**\n * Detect the {@link FileType} for a file path based on its extension.\n * Returns `null` for unrecognized extensions or files without an extension.\n */\nexport function detectFileType(filePath: string): FileType | null {\n\tconst ext = extname(filePath).toLowerCase();\n\tif (!ext) return null;\n\treturn EXTENSION_MAP.get(ext) ?? null;\n}\n\n/**\n * Scan a project directory and return all indexable files.\n *\n * Walks the tree rooted at {@link projectRoot}, respects `.gitignore` rules,\n * skips binary / oversized files, and optionally includes memory files from\n * a global memory directory.\n */\nexport async function scanProject(projectRoot: string, globalMemoryDir?: string): Promise<ScannedFile[]> {\n\tconst results: ScannedFile[] = [];\n\n\t// Detect if projectRoot is the home directory — use shallow scan mode\n\t// to avoid recursing into the entire home dir (which would be catastrophic).\n\tconst isHomeDir = isHomeDirPath(projectRoot);\n\n\tif (isHomeDir) {\n\t\t// Shallow mode: only scan top-level files and ~/.dreb/memory/\n\t\tscanShallow(projectRoot, results);\n\t} else {\n\t\t// Normal mode: full recursive walk with .gitignore\n\t\tconst ig = ignore();\n\t\tloadGitignore(ig, projectRoot, projectRoot);\n\t\twalkDirectory(projectRoot, projectRoot, ig, results);\n\t}\n\n\t// Include tool-visible .dreb/ subdirs (bypasses gitignore).\n\t// In home dir mode, global memory is already handled separately below,\n\t// and we don't want to double-scan ~/.dreb/memory/.\n\tif (!isHomeDir) {\n\t\tfor (const dir of getDrebToolVisibleDirs(projectRoot)) {\n\t\t\tscanMemoryDir(dir, projectRoot, results);\n\t\t}\n\t}\n\n\t// Include global memory files if the directory exists\n\tif (globalMemoryDir && existsSync(globalMemoryDir)) {\n\t\tscanMemoryDir(globalMemoryDir, projectRoot, results);\n\t}\n\n\treturn results;\n}\n\n/** Check if a path is the user's home directory. */\nfunction isHomeDirPath(dir: string): boolean {\n\ttry {\n\t\tconst home = homedir();\n\t\t// Normalize trailing slashes for comparison\n\t\tconst normalizedDir = dir.replace(/[/\\\\]+$/, \"\");\n\t\tconst normalizedHome = home.replace(/[/\\\\]+$/, \"\");\n\t\treturn normalizedDir === normalizedHome;\n\t} catch {\n\t\treturn false;\n\t}\n}\n\n/**\n * Shallow scan mode for home directory: only index top-level files\n * (no directory recursion) to avoid scanning the entire home directory.\n * Memory files are handled separately via scanMemoryDir.\n */\nfunction scanShallow(dir: string, results: ScannedFile[]): void {\n\tlet entries: string[];\n\ttry {\n\t\tentries = readdirSync(dir);\n\t} catch {\n\t\treturn;\n\t}\n\n\tfor (const entry of entries) {\n\t\t// Skip dotfiles/dotdirs in home dir (except specific ones we want)\n\t\tif (entry.startsWith(\".\")) continue;\n\n\t\tconst fullPath = join(dir, entry);\n\n\t\tlet stats: Stats;\n\t\ttry {\n\t\t\tstats = statSync(fullPath);\n\t\t} catch {\n\t\t\tcontinue;\n\t\t}\n\n\t\t// Only index files, not directories (shallow mode)\n\t\tif (!stats.isFile()) continue;\n\t\tif (stats.size > MAX_FILE_SIZE) continue;\n\t\tif (stats.size === 0) continue;\n\n\t\tconst fileType = detectFileType(entry);\n\t\tif (!fileType) continue;\n\n\t\tresults.push({\n\t\t\tfilePath: entry,\n\t\t\tfileType,\n\t\t\tmtime: stats.mtimeMs,\n\t\t});\n\t}\n}\n\n// ============================================================================\n// Internal helpers\n// ============================================================================\n\ntype IgnoreMatcher = ReturnType<typeof ignore>;\n\n/** Convert an OS path to posix separators for ignore matching. */\nfunction toPosix(p: string): string {\n\treturn p.split(sep).join(\"/\");\n}\n\n/** Load .gitignore rules from a directory into the ignore matcher. */\nfunction loadGitignore(ig: IgnoreMatcher, dir: string, root: string): void {\n\tconst gitignorePath = join(dir, \".gitignore\");\n\tif (!existsSync(gitignorePath)) return;\n\n\ttry {\n\t\tconst content = readFileSync(gitignorePath, \"utf-8\");\n\t\tconst relDir = relative(root, dir);\n\t\tconst prefix = relDir ? `${toPosix(relDir)}/` : \"\";\n\n\t\tconst patterns = content\n\t\t\t.split(/\\r?\\n/)\n\t\t\t.map((line) => prefixPattern(line, prefix))\n\t\t\t.filter((line): line is string => line !== null);\n\n\t\tif (patterns.length > 0) {\n\t\t\tig.add(patterns);\n\t\t}\n\t} catch {\n\t\t// Unreadable .gitignore — skip silently\n\t}\n}\n\n/**\n * Prefix a .gitignore pattern with a directory path so it applies\n * correctly when matching against root-relative paths.\n */\nfunction prefixPattern(line: string, prefix: string): string | null {\n\tconst trimmed = line.trim();\n\tif (!trimmed) return null;\n\tif (trimmed.startsWith(\"#\") && !trimmed.startsWith(\"\\\\#\")) return null;\n\n\tlet pattern = line;\n\tlet negated = false;\n\n\tif (pattern.startsWith(\"!\")) {\n\t\tnegated = true;\n\t\tpattern = pattern.slice(1);\n\t} else if (pattern.startsWith(\"\\\\!\")) {\n\t\tpattern = pattern.slice(1);\n\t}\n\n\tconst prefixed = prefix ? `${prefix}${pattern}` : pattern;\n\treturn negated ? `!${prefixed}` : prefixed;\n}\n\n/**\n * Check if a directory component (relative to root) should be unconditionally skipped.\n * Handles both top-level names (\"node_modules\") and nested paths (\".dreb/index\").\n */\nfunction shouldSkipDir(relPath: string): boolean {\n\tconst posix = toPosix(relPath);\n\n\t// Check the directory name itself\n\tconst parts = posix.split(\"/\");\n\tconst name = parts[parts.length - 1];\n\tif (SKIP_DIRS.has(name)) return true;\n\n\t// Check multi-segment skip patterns (e.g. \".dreb/index\")\n\tfor (const skip of SKIP_DIRS) {\n\t\tif (skip.includes(\"/\") && (posix === skip || posix.endsWith(`/${skip}`))) {\n\t\t\treturn true;\n\t\t}\n\t}\n\n\treturn false;\n}\n\n/** Recursively walk a directory, collecting indexable files. */\nfunction walkDirectory(dir: string, root: string, ig: IgnoreMatcher, results: ScannedFile[]): void {\n\tlet entries: string[];\n\ttry {\n\t\tentries = readdirSync(dir);\n\t} catch {\n\t\treturn; // Permission denied, etc.\n\t}\n\n\tfor (const entry of entries) {\n\t\tconst fullPath = join(dir, entry);\n\t\tconst relPath = relative(root, fullPath);\n\t\tconst posixRel = toPosix(relPath);\n\n\t\tlet stats: Stats;\n\t\ttry {\n\t\t\tstats = statSync(fullPath);\n\t\t} catch {\n\t\t\tcontinue; // Broken symlink, etc.\n\t\t}\n\n\t\tif (stats.isDirectory()) {\n\t\t\t// Hard-coded skip list\n\t\t\tif (shouldSkipDir(relPath)) continue;\n\n\t\t\t// .gitignore check (directories need trailing slash)\n\t\t\tif (ig.ignores(`${posixRel}/`)) continue;\n\n\t\t\t// Load nested .gitignore before descending\n\t\t\tloadGitignore(ig, fullPath, root);\n\n\t\t\twalkDirectory(fullPath, root, ig, results);\n\t\t\tcontinue;\n\t\t}\n\n\t\tif (!stats.isFile()) continue;\n\n\t\t// .gitignore check for files\n\t\tif (ig.ignores(posixRel)) continue;\n\n\t\t// Size gate\n\t\tif (stats.size > MAX_FILE_SIZE) continue;\n\t\tif (stats.size === 0) continue;\n\n\t\t// File type detection\n\t\tconst fileType = detectFileType(entry);\n\t\tif (!fileType) continue;\n\n\t\tresults.push({\n\t\t\tfilePath: posixRel,\n\t\t\tfileType,\n\t\t\tmtime: stats.mtimeMs,\n\t\t});\n\t}\n}\n\n/**\n * Scan a memory directory (project or global) for indexable files.\n *\n * Memory directories are always fully included — no .gitignore filtering —\n * because they live outside the normal project tree or in `.dreb/` which\n * is typically gitignored.\n *\n * Paths for global memory files are stored with a `~memory/` prefix\n * to distinguish them from project files.\n */\nfunction scanMemoryDir(memoryDir: string, projectRoot: string, results: ScannedFile[], baseMemoryDir?: string): void {\n\tlet entries: string[];\n\ttry {\n\t\tentries = readdirSync(memoryDir);\n\t} catch {\n\t\treturn;\n\t}\n\n\tfor (const entry of entries) {\n\t\tconst fullPath = join(memoryDir, entry);\n\n\t\tlet stats: Stats;\n\t\ttry {\n\t\t\tstats = statSync(fullPath);\n\t\t} catch {\n\t\t\tcontinue;\n\t\t}\n\n\t\tif (stats.isDirectory()) {\n\t\t\t// Recurse into subdirectories\n\t\t\tscanMemoryDir(fullPath, projectRoot, results, baseMemoryDir ?? memoryDir);\n\t\t\tcontinue;\n\t\t}\n\n\t\tif (!stats.isFile()) continue;\n\t\tif (stats.size > MAX_FILE_SIZE) continue;\n\t\tif (stats.size === 0) continue;\n\n\t\tconst fileType = detectFileType(entry);\n\t\tif (!fileType) continue;\n\n\t\t// If the memory dir is inside the project root, use normal relative path.\n\t\t// Otherwise, use a ~memory/ prefix so paths remain unique and identifiable.\n\t\tconst rel = relative(projectRoot, fullPath);\n\t\tconst isOutsideProject = rel.startsWith(\"..\") || isAbsolute(rel);\n\t\tconst rootMemoryDir = baseMemoryDir ?? memoryDir;\n\t\tconst filePath = isOutsideProject ? `~memory/${relative(rootMemoryDir, fullPath)}` : rel;\n\n\t\tresults.push({\n\t\t\tfilePath: toPosix(filePath),\n\t\t\tfileType,\n\t\t\tmtime: stats.mtimeMs,\n\t\t});\n\t}\n}\n"]}
|
|
@@ -8,6 +8,7 @@ import { existsSync, readdirSync, readFileSync, statSync } from "node:fs";
|
|
|
8
8
|
import { homedir } from "node:os";
|
|
9
9
|
import { extname, isAbsolute, join, relative, sep } from "node:path";
|
|
10
10
|
import ignore from "ignore";
|
|
11
|
+
import { getDrebToolVisibleDirs } from "../tools/dreb-paths.js";
|
|
11
12
|
// ============================================================================
|
|
12
13
|
// Constants
|
|
13
14
|
// ============================================================================
|
|
@@ -98,6 +99,14 @@ export async function scanProject(projectRoot, globalMemoryDir) {
|
|
|
98
99
|
loadGitignore(ig, projectRoot, projectRoot);
|
|
99
100
|
walkDirectory(projectRoot, projectRoot, ig, results);
|
|
100
101
|
}
|
|
102
|
+
// Include tool-visible .dreb/ subdirs (bypasses gitignore).
|
|
103
|
+
// In home dir mode, global memory is already handled separately below,
|
|
104
|
+
// and we don't want to double-scan ~/.dreb/memory/.
|
|
105
|
+
if (!isHomeDir) {
|
|
106
|
+
for (const dir of getDrebToolVisibleDirs(projectRoot)) {
|
|
107
|
+
scanMemoryDir(dir, projectRoot, results);
|
|
108
|
+
}
|
|
109
|
+
}
|
|
101
110
|
// Include global memory files if the directory exists
|
|
102
111
|
if (globalMemoryDir && existsSync(globalMemoryDir)) {
|
|
103
112
|
scanMemoryDir(globalMemoryDir, projectRoot, results);
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"scanner.js","sourceRoot":"","sources":["../../../src/core/search/scanner.ts"],"names":[],"mappings":"AAAA;;;;;GAKG;AAEH,OAAO,EAAE,UAAU,EAAE,WAAW,EAAE,YAAY,EAAc,QAAQ,EAAE,MAAM,SAAS,CAAC;AACtF,OAAO,EAAE,OAAO,EAAE,MAAM,SAAS,CAAC;AAClC,OAAO,EAAE,OAAO,EAAE,UAAU,EAAE,IAAI,EAAE,QAAQ,EAAE,GAAG,EAAE,MAAM,WAAW,CAAC;AACrE,OAAO,MAAM,MAAM,QAAQ,CAAC;AAiB5B,+EAA+E;AAC/E,YAAY;AACZ,+EAA+E;AAE/E,yCAAyC;AACzC,MAAM,aAAa,GAAG,IAAI,GAAG,IAAI,CAAC;AAElC,4DAA4D;AAC5D,MAAM,SAAS,GAAG,IAAI,GAAG,CAAC;IACzB,cAAc;IACd,MAAM;IACN,aAAa;IACb,KAAK;IACL,MAAM;IACN,aAAa;IACb,MAAM;IACN,OAAO;IACP,MAAM;IACN,OAAO;IACP,OAAO;IACP,OAAO;IACP,UAAU;IACV,QAAQ;CACR,CAAC,CAAC;AAEH,sCAAoC;AACpC,MAAM,aAAa,GAAkC,IAAI,GAAG,CAAmB;IAC9E,wBAAwB;IACxB,CAAC,KAAK,EAAE,YAAY,CAAC;IACrB,CAAC,MAAM,EAAE,KAAK,CAAC;IACf,CAAC,KAAK,EAAE,YAAY,CAAC;IACrB,CAAC,MAAM,EAAE,YAAY,CAAC;IACtB,CAAC,MAAM,EAAE,YAAY,CAAC;IACtB,CAAC,KAAK,EAAE,QAAQ,CAAC;IACjB,CAAC,KAAK,EAAE,IAAI,CAAC;IACb,CAAC,KAAK,EAAE,MAAM,CAAC;IACf,CAAC,OAAO,EAAE,MAAM,CAAC;IACjB,CAAC,IAAI,EAAE,GAAG,CAAC;IACX,CAAC,IAAI,EAAE,GAAG,CAAC;IACX,CAAC,MAAM,EAAE,KAAK,CAAC;IACf,CAAC,MAAM,EAAE,KAAK,CAAC;IACf,CAAC,KAAK,EAAE,KAAK,CAAC;IACd,CAAC,MAAM,EAAE,KAAK,CAAC;IACf,CAAC,KAAK,EAAE,KAAK,CAAC;IACd,CAAC,MAAM,EAAE,KAAK,CAAC;IACf,kBAAkB;IAClB,CAAC,KAAK,EAAE,UAAU,CAAC;IACnB,CAAC,MAAM,EAAE,UAAU,CAAC;IACpB,CAAC,MAAM,EAAE,MAAM,CAAC;IAChB,CAAC,OAAO,EAAE,MAAM,CAAC;IACjB,CAAC,OAAO,EAAE,MAAM,CAAC;IACjB,CAAC,OAAO,EAAE,MAAM,CAAC;IACjB,CAAC,MAAM,EAAE,WAAW,CAAC;IACrB,CAAC,MAAM,EAAE,WAAW,CAAC;IACrB,CAAC,MAAM,EAAE,WAAW,CAAC;IACrB,CAAC,MAAM,EAAE,WAAW,CAAC;IACrB,CAAC,OAAO,EAAE,WAAW,CAAC;CACtB,CAAC,CAAC;AAEH,+EAA+E;AAC/E,aAAa;AACb,+EAA+E;AAE/E;;;GAGG;AACH,MAAM,UAAU,cAAc,CAAC,QAAgB,EAAmB;IACjE,MAAM,GAAG,GAAG,OAAO,CAAC,QAAQ,CAAC,CAAC,WAAW,EAAE,CAAC;IAC5C,IAAI,CAAC,GAAG;QAAE,OAAO,IAAI,CAAC;IACtB,OAAO,aAAa,CAAC,GAAG,CAAC,GAAG,CAAC,IAAI,IAAI,CAAC;AAAA,CACtC;AAED;;;;;;GAMG;AACH,MAAM,CAAC,KAAK,UAAU,WAAW,CAAC,WAAmB,EAAE,eAAwB,EAA0B;IACxG,MAAM,OAAO,GAAkB,EAAE,CAAC;IAElC,wEAAsE;IACtE,6EAA6E;IAC7E,MAAM,SAAS,GAAG,aAAa,CAAC,WAAW,CAAC,CAAC;IAE7C,IAAI,SAAS,EAAE,CAAC;QACf,8DAA8D;QAC9D,WAAW,CAAC,WAAW,EAAE,OAAO,CAAC,CAAC;IACnC,CAAC;SAAM,CAAC;QACP,mDAAmD;QACnD,MAAM,EAAE,GAAG,MAAM,EAAE,CAAC;QACpB,aAAa,CAAC,EAAE,EAAE,WAAW,EAAE,WAAW,CAAC,CAAC;QAC5C,aAAa,CAAC,WAAW,EAAE,WAAW,EAAE,EAAE,EAAE,OAAO,CAAC,CAAC;IACtD,CAAC;IAED,sDAAsD;IACtD,IAAI,eAAe,IAAI,UAAU,CAAC,eAAe,CAAC,EAAE,CAAC;QACpD,aAAa,CAAC,eAAe,EAAE,WAAW,EAAE,OAAO,CAAC,CAAC;IACtD,CAAC;IAED,OAAO,OAAO,CAAC;AAAA,CACf;AAED,oDAAoD;AACpD,SAAS,aAAa,CAAC,GAAW,EAAW;IAC5C,IAAI,CAAC;QACJ,MAAM,IAAI,GAAG,OAAO,EAAE,CAAC;QACvB,4CAA4C;QAC5C,MAAM,aAAa,GAAG,GAAG,CAAC,OAAO,CAAC,SAAS,EAAE,EAAE,CAAC,CAAC;QACjD,MAAM,cAAc,GAAG,IAAI,CAAC,OAAO,CAAC,SAAS,EAAE,EAAE,CAAC,CAAC;QACnD,OAAO,aAAa,KAAK,cAAc,CAAC;IACzC,CAAC;IAAC,MAAM,CAAC;QACR,OAAO,KAAK,CAAC;IACd,CAAC;AAAA,CACD;AAED;;;;GAIG;AACH,SAAS,WAAW,CAAC,GAAW,EAAE,OAAsB,EAAQ;IAC/D,IAAI,OAAiB,CAAC;IACtB,IAAI,CAAC;QACJ,OAAO,GAAG,WAAW,CAAC,GAAG,CAAC,CAAC;IAC5B,CAAC;IAAC,MAAM,CAAC;QACR,OAAO;IACR,CAAC;IAED,KAAK,MAAM,KAAK,IAAI,OAAO,EAAE,CAAC;QAC7B,mEAAmE;QACnE,IAAI,KAAK,CAAC,UAAU,CAAC,GAAG,CAAC;YAAE,SAAS;QAEpC,MAAM,QAAQ,GAAG,IAAI,CAAC,GAAG,EAAE,KAAK,CAAC,CAAC;QAElC,IAAI,KAAY,CAAC;QACjB,IAAI,CAAC;YACJ,KAAK,GAAG,QAAQ,CAAC,QAAQ,CAAC,CAAC;QAC5B,CAAC;QAAC,MAAM,CAAC;YACR,SAAS;QACV,CAAC;QAED,mDAAmD;QACnD,IAAI,CAAC,KAAK,CAAC,MAAM,EAAE;YAAE,SAAS;QAC9B,IAAI,KAAK,CAAC,IAAI,GAAG,aAAa;YAAE,SAAS;QACzC,IAAI,KAAK,CAAC,IAAI,KAAK,CAAC;YAAE,SAAS;QAE/B,MAAM,QAAQ,GAAG,cAAc,CAAC,KAAK,CAAC,CAAC;QACvC,IAAI,CAAC,QAAQ;YAAE,SAAS;QAExB,OAAO,CAAC,IAAI,CAAC;YACZ,QAAQ,EAAE,KAAK;YACf,QAAQ;YACR,KAAK,EAAE,KAAK,CAAC,OAAO;SACpB,CAAC,CAAC;IACJ,CAAC;AAAA,CACD;AAQD,kEAAkE;AAClE,SAAS,OAAO,CAAC,CAAS,EAAU;IACnC,OAAO,CAAC,CAAC,KAAK,CAAC,GAAG,CAAC,CAAC,IAAI,CAAC,GAAG,CAAC,CAAC;AAAA,CAC9B;AAED,sEAAsE;AACtE,SAAS,aAAa,CAAC,EAAiB,EAAE,GAAW,EAAE,IAAY,EAAQ;IAC1E,MAAM,aAAa,GAAG,IAAI,CAAC,GAAG,EAAE,YAAY,CAAC,CAAC;IAC9C,IAAI,CAAC,UAAU,CAAC,aAAa,CAAC;QAAE,OAAO;IAEvC,IAAI,CAAC;QACJ,MAAM,OAAO,GAAG,YAAY,CAAC,aAAa,EAAE,OAAO,CAAC,CAAC;QACrD,MAAM,MAAM,GAAG,QAAQ,CAAC,IAAI,EAAE,GAAG,CAAC,CAAC;QACnC,MAAM,MAAM,GAAG,MAAM,CAAC,CAAC,CAAC,GAAG,OAAO,CAAC,MAAM,CAAC,GAAG,CAAC,CAAC,CAAC,EAAE,CAAC;QAEnD,MAAM,QAAQ,GAAG,OAAO;aACtB,KAAK,CAAC,OAAO,CAAC;aACd,GAAG,CAAC,CAAC,IAAI,EAAE,EAAE,CAAC,aAAa,CAAC,IAAI,EAAE,MAAM,CAAC,CAAC;aAC1C,MAAM,CAAC,CAAC,IAAI,EAAkB,EAAE,CAAC,IAAI,KAAK,IAAI,CAAC,CAAC;QAElD,IAAI,QAAQ,CAAC,MAAM,GAAG,CAAC,EAAE,CAAC;YACzB,EAAE,CAAC,GAAG,CAAC,QAAQ,CAAC,CAAC;QAClB,CAAC;IACF,CAAC;IAAC,MAAM,CAAC;QACR,0CAAwC;IACzC,CAAC;AAAA,CACD;AAED;;;GAGG;AACH,SAAS,aAAa,CAAC,IAAY,EAAE,MAAc,EAAiB;IACnE,MAAM,OAAO,GAAG,IAAI,CAAC,IAAI,EAAE,CAAC;IAC5B,IAAI,CAAC,OAAO;QAAE,OAAO,IAAI,CAAC;IAC1B,IAAI,OAAO,CAAC,UAAU,CAAC,GAAG,CAAC,IAAI,CAAC,OAAO,CAAC,UAAU,CAAC,KAAK,CAAC;QAAE,OAAO,IAAI,CAAC;IAEvE,IAAI,OAAO,GAAG,IAAI,CAAC;IACnB,IAAI,OAAO,GAAG,KAAK,CAAC;IAEpB,IAAI,OAAO,CAAC,UAAU,CAAC,GAAG,CAAC,EAAE,CAAC;QAC7B,OAAO,GAAG,IAAI,CAAC;QACf,OAAO,GAAG,OAAO,CAAC,KAAK,CAAC,CAAC,CAAC,CAAC;IAC5B,CAAC;SAAM,IAAI,OAAO,CAAC,UAAU,CAAC,KAAK,CAAC,EAAE,CAAC;QACtC,OAAO,GAAG,OAAO,CAAC,KAAK,CAAC,CAAC,CAAC,CAAC;IAC5B,CAAC;IAED,MAAM,QAAQ,GAAG,MAAM,CAAC,CAAC,CAAC,GAAG,MAAM,GAAG,OAAO,EAAE,CAAC,CAAC,CAAC,OAAO,CAAC;IAC1D,OAAO,OAAO,CAAC,CAAC,CAAC,IAAI,QAAQ,EAAE,CAAC,CAAC,CAAC,QAAQ,CAAC;AAAA,CAC3C;AAED;;;GAGG;AACH,SAAS,aAAa,CAAC,OAAe,EAAW;IAChD,MAAM,KAAK,GAAG,OAAO,CAAC,OAAO,CAAC,CAAC;IAE/B,kCAAkC;IAClC,MAAM,KAAK,GAAG,KAAK,CAAC,KAAK,CAAC,GAAG,CAAC,CAAC;IAC/B,MAAM,IAAI,GAAG,KAAK,CAAC,KAAK,CAAC,MAAM,GAAG,CAAC,CAAC,CAAC;IACrC,IAAI,SAAS,CAAC,GAAG,CAAC,IAAI,CAAC;QAAE,OAAO,IAAI,CAAC;IAErC,yDAAyD;IACzD,KAAK,MAAM,IAAI,IAAI,SAAS,EAAE,CAAC;QAC9B,IAAI,IAAI,CAAC,QAAQ,CAAC,GAAG,CAAC,IAAI,CAAC,KAAK,KAAK,IAAI,IAAI,KAAK,CAAC,QAAQ,CAAC,IAAI,IAAI,EAAE,CAAC,CAAC,EAAE,CAAC;YAC1E,OAAO,IAAI,CAAC;QACb,CAAC;IACF,CAAC;IAED,OAAO,KAAK,CAAC;AAAA,CACb;AAED,gEAAgE;AAChE,SAAS,aAAa,CAAC,GAAW,EAAE,IAAY,EAAE,EAAiB,EAAE,OAAsB,EAAQ;IAClG,IAAI,OAAiB,CAAC;IACtB,IAAI,CAAC;QACJ,OAAO,GAAG,WAAW,CAAC,GAAG,CAAC,CAAC;IAC5B,CAAC;IAAC,MAAM,CAAC;QACR,OAAO,CAAC,0BAA0B;IACnC,CAAC;IAED,KAAK,MAAM,KAAK,IAAI,OAAO,EAAE,CAAC;QAC7B,MAAM,QAAQ,GAAG,IAAI,CAAC,GAAG,EAAE,KAAK,CAAC,CAAC;QAClC,MAAM,OAAO,GAAG,QAAQ,CAAC,IAAI,EAAE,QAAQ,CAAC,CAAC;QACzC,MAAM,QAAQ,GAAG,OAAO,CAAC,OAAO,CAAC,CAAC;QAElC,IAAI,KAAY,CAAC;QACjB,IAAI,CAAC;YACJ,KAAK,GAAG,QAAQ,CAAC,QAAQ,CAAC,CAAC;QAC5B,CAAC;QAAC,MAAM,CAAC;YACR,SAAS,CAAC,uBAAuB;QAClC,CAAC;QAED,IAAI,KAAK,CAAC,WAAW,EAAE,EAAE,CAAC;YACzB,uBAAuB;YACvB,IAAI,aAAa,CAAC,OAAO,CAAC;gBAAE,SAAS;YAErC,qDAAqD;YACrD,IAAI,EAAE,CAAC,OAAO,CAAC,GAAG,QAAQ,GAAG,CAAC;gBAAE,SAAS;YAEzC,2CAA2C;YAC3C,aAAa,CAAC,EAAE,EAAE,QAAQ,EAAE,IAAI,CAAC,CAAC;YAElC,aAAa,CAAC,QAAQ,EAAE,IAAI,EAAE,EAAE,EAAE,OAAO,CAAC,CAAC;YAC3C,SAAS;QACV,CAAC;QAED,IAAI,CAAC,KAAK,CAAC,MAAM,EAAE;YAAE,SAAS;QAE9B,6BAA6B;QAC7B,IAAI,EAAE,CAAC,OAAO,CAAC,QAAQ,CAAC;YAAE,SAAS;QAEnC,YAAY;QACZ,IAAI,KAAK,CAAC,IAAI,GAAG,aAAa;YAAE,SAAS;QACzC,IAAI,KAAK,CAAC,IAAI,KAAK,CAAC;YAAE,SAAS;QAE/B,sBAAsB;QACtB,MAAM,QAAQ,GAAG,cAAc,CAAC,KAAK,CAAC,CAAC;QACvC,IAAI,CAAC,QAAQ;YAAE,SAAS;QAExB,OAAO,CAAC,IAAI,CAAC;YACZ,QAAQ,EAAE,QAAQ;YAClB,QAAQ;YACR,KAAK,EAAE,KAAK,CAAC,OAAO;SACpB,CAAC,CAAC;IACJ,CAAC;AAAA,CACD;AAED;;;;;;;;;GASG;AACH,SAAS,aAAa,CAAC,SAAiB,EAAE,WAAmB,EAAE,OAAsB,EAAE,aAAsB,EAAQ;IACpH,IAAI,OAAiB,CAAC;IACtB,IAAI,CAAC;QACJ,OAAO,GAAG,WAAW,CAAC,SAAS,CAAC,CAAC;IAClC,CAAC;IAAC,MAAM,CAAC;QACR,OAAO;IACR,CAAC;IAED,KAAK,MAAM,KAAK,IAAI,OAAO,EAAE,CAAC;QAC7B,MAAM,QAAQ,GAAG,IAAI,CAAC,SAAS,EAAE,KAAK,CAAC,CAAC;QAExC,IAAI,KAAY,CAAC;QACjB,IAAI,CAAC;YACJ,KAAK,GAAG,QAAQ,CAAC,QAAQ,CAAC,CAAC;QAC5B,CAAC;QAAC,MAAM,CAAC;YACR,SAAS;QACV,CAAC;QAED,IAAI,KAAK,CAAC,WAAW,EAAE,EAAE,CAAC;YACzB,8BAA8B;YAC9B,aAAa,CAAC,QAAQ,EAAE,WAAW,EAAE,OAAO,EAAE,aAAa,IAAI,SAAS,CAAC,CAAC;YAC1E,SAAS;QACV,CAAC;QAED,IAAI,CAAC,KAAK,CAAC,MAAM,EAAE;YAAE,SAAS;QAC9B,IAAI,KAAK,CAAC,IAAI,GAAG,aAAa;YAAE,SAAS;QACzC,IAAI,KAAK,CAAC,IAAI,KAAK,CAAC;YAAE,SAAS;QAE/B,MAAM,QAAQ,GAAG,cAAc,CAAC,KAAK,CAAC,CAAC;QACvC,IAAI,CAAC,QAAQ;YAAE,SAAS;QAExB,0EAA0E;QAC1E,4EAA4E;QAC5E,MAAM,GAAG,GAAG,QAAQ,CAAC,WAAW,EAAE,QAAQ,CAAC,CAAC;QAC5C,MAAM,gBAAgB,GAAG,GAAG,CAAC,UAAU,CAAC,IAAI,CAAC,IAAI,UAAU,CAAC,GAAG,CAAC,CAAC;QACjE,MAAM,aAAa,GAAG,aAAa,IAAI,SAAS,CAAC;QACjD,MAAM,QAAQ,GAAG,gBAAgB,CAAC,CAAC,CAAC,WAAW,QAAQ,CAAC,aAAa,EAAE,QAAQ,CAAC,EAAE,CAAC,CAAC,CAAC,GAAG,CAAC;QAEzF,OAAO,CAAC,IAAI,CAAC;YACZ,QAAQ,EAAE,OAAO,CAAC,QAAQ,CAAC;YAC3B,QAAQ;YACR,KAAK,EAAE,KAAK,CAAC,OAAO;SACpB,CAAC,CAAC;IACJ,CAAC;AAAA,CACD","sourcesContent":["/**\n * File scanner for the semantic search subsystem.\n *\n * Discovers project files for indexing by walking the directory tree,\n * respecting .gitignore rules, and classifying files by type.\n */\n\nimport { existsSync, readdirSync, readFileSync, type Stats, statSync } from \"node:fs\";\nimport { homedir } from \"node:os\";\nimport { extname, isAbsolute, join, relative, sep } from \"node:path\";\nimport ignore from \"ignore\";\nimport type { FileType } from \"./types.js\";\n\n// ============================================================================\n// Public types\n// ============================================================================\n\n/** A file discovered by the scanner, ready for indexing. */\nexport interface ScannedFile {\n\t/** Path relative to the project root (posix separators). */\n\tfilePath: string;\n\t/** Detected file type. */\n\tfileType: FileType;\n\t/** File modification time in milliseconds since epoch. */\n\tmtime: number;\n}\n\n// ============================================================================\n// Constants\n// ============================================================================\n\n/** Maximum file size to index (1 MB). */\nconst MAX_FILE_SIZE = 1024 * 1024;\n\n/** Directories unconditionally skipped during traversal. */\nconst SKIP_DIRS = new Set([\n\t\"node_modules\",\n\t\".git\",\n\t\".dreb/index\",\n\t\".hg\",\n\t\".svn\",\n\t\"__pycache__\",\n\t\".tox\",\n\t\".venv\",\n\t\"dist\",\n\t\"build\",\n\t\".next\",\n\t\".nuxt\",\n\t\"coverage\",\n\t\".cache\",\n]);\n\n/** Extension → FileType mapping. */\nconst EXTENSION_MAP: ReadonlyMap<string, FileType> = new Map<string, FileType>([\n\t// Tree-sitter languages\n\t[\".ts\", \"typescript\"],\n\t[\".tsx\", \"tsx\"],\n\t[\".js\", \"javascript\"],\n\t[\".mjs\", \"javascript\"],\n\t[\".cjs\", \"javascript\"],\n\t[\".py\", \"python\"],\n\t[\".go\", \"go\"],\n\t[\".rs\", \"rust\"],\n\t[\".java\", \"java\"],\n\t[\".c\", \"c\"],\n\t[\".h\", \"c\"],\n\t[\".cpp\", \"cpp\"],\n\t[\".hpp\", \"cpp\"],\n\t[\".cc\", \"cpp\"],\n\t[\".cxx\", \"cpp\"],\n\t[\".hh\", \"cpp\"],\n\t[\".hxx\", \"cpp\"],\n\t// Text file types\n\t[\".md\", \"markdown\"],\n\t[\".mdx\", \"markdown\"],\n\t[\".yml\", \"yaml\"],\n\t[\".yaml\", \"yaml\"],\n\t[\".json\", \"json\"],\n\t[\".toml\", \"toml\"],\n\t[\".txt\", \"plaintext\"],\n\t[\".cfg\", \"plaintext\"],\n\t[\".ini\", \"plaintext\"],\n\t[\".env\", \"plaintext\"],\n\t[\".conf\", \"plaintext\"],\n]);\n\n// ============================================================================\n// Public API\n// ============================================================================\n\n/**\n * Detect the {@link FileType} for a file path based on its extension.\n * Returns `null` for unrecognized extensions or files without an extension.\n */\nexport function detectFileType(filePath: string): FileType | null {\n\tconst ext = extname(filePath).toLowerCase();\n\tif (!ext) return null;\n\treturn EXTENSION_MAP.get(ext) ?? null;\n}\n\n/**\n * Scan a project directory and return all indexable files.\n *\n * Walks the tree rooted at {@link projectRoot}, respects `.gitignore` rules,\n * skips binary / oversized files, and optionally includes memory files from\n * a global memory directory.\n */\nexport async function scanProject(projectRoot: string, globalMemoryDir?: string): Promise<ScannedFile[]> {\n\tconst results: ScannedFile[] = [];\n\n\t// Detect if projectRoot is the home directory — use shallow scan mode\n\t// to avoid recursing into the entire home dir (which would be catastrophic).\n\tconst isHomeDir = isHomeDirPath(projectRoot);\n\n\tif (isHomeDir) {\n\t\t// Shallow mode: only scan top-level files and ~/.dreb/memory/\n\t\tscanShallow(projectRoot, results);\n\t} else {\n\t\t// Normal mode: full recursive walk with .gitignore\n\t\tconst ig = ignore();\n\t\tloadGitignore(ig, projectRoot, projectRoot);\n\t\twalkDirectory(projectRoot, projectRoot, ig, results);\n\t}\n\n\t// Include global memory files if the directory exists\n\tif (globalMemoryDir && existsSync(globalMemoryDir)) {\n\t\tscanMemoryDir(globalMemoryDir, projectRoot, results);\n\t}\n\n\treturn results;\n}\n\n/** Check if a path is the user's home directory. */\nfunction isHomeDirPath(dir: string): boolean {\n\ttry {\n\t\tconst home = homedir();\n\t\t// Normalize trailing slashes for comparison\n\t\tconst normalizedDir = dir.replace(/[/\\\\]+$/, \"\");\n\t\tconst normalizedHome = home.replace(/[/\\\\]+$/, \"\");\n\t\treturn normalizedDir === normalizedHome;\n\t} catch {\n\t\treturn false;\n\t}\n}\n\n/**\n * Shallow scan mode for home directory: only index top-level files\n * (no directory recursion) to avoid scanning the entire home directory.\n * Memory files are handled separately via scanMemoryDir.\n */\nfunction scanShallow(dir: string, results: ScannedFile[]): void {\n\tlet entries: string[];\n\ttry {\n\t\tentries = readdirSync(dir);\n\t} catch {\n\t\treturn;\n\t}\n\n\tfor (const entry of entries) {\n\t\t// Skip dotfiles/dotdirs in home dir (except specific ones we want)\n\t\tif (entry.startsWith(\".\")) continue;\n\n\t\tconst fullPath = join(dir, entry);\n\n\t\tlet stats: Stats;\n\t\ttry {\n\t\t\tstats = statSync(fullPath);\n\t\t} catch {\n\t\t\tcontinue;\n\t\t}\n\n\t\t// Only index files, not directories (shallow mode)\n\t\tif (!stats.isFile()) continue;\n\t\tif (stats.size > MAX_FILE_SIZE) continue;\n\t\tif (stats.size === 0) continue;\n\n\t\tconst fileType = detectFileType(entry);\n\t\tif (!fileType) continue;\n\n\t\tresults.push({\n\t\t\tfilePath: entry,\n\t\t\tfileType,\n\t\t\tmtime: stats.mtimeMs,\n\t\t});\n\t}\n}\n\n// ============================================================================\n// Internal helpers\n// ============================================================================\n\ntype IgnoreMatcher = ReturnType<typeof ignore>;\n\n/** Convert an OS path to posix separators for ignore matching. */\nfunction toPosix(p: string): string {\n\treturn p.split(sep).join(\"/\");\n}\n\n/** Load .gitignore rules from a directory into the ignore matcher. */\nfunction loadGitignore(ig: IgnoreMatcher, dir: string, root: string): void {\n\tconst gitignorePath = join(dir, \".gitignore\");\n\tif (!existsSync(gitignorePath)) return;\n\n\ttry {\n\t\tconst content = readFileSync(gitignorePath, \"utf-8\");\n\t\tconst relDir = relative(root, dir);\n\t\tconst prefix = relDir ? `${toPosix(relDir)}/` : \"\";\n\n\t\tconst patterns = content\n\t\t\t.split(/\\r?\\n/)\n\t\t\t.map((line) => prefixPattern(line, prefix))\n\t\t\t.filter((line): line is string => line !== null);\n\n\t\tif (patterns.length > 0) {\n\t\t\tig.add(patterns);\n\t\t}\n\t} catch {\n\t\t// Unreadable .gitignore — skip silently\n\t}\n}\n\n/**\n * Prefix a .gitignore pattern with a directory path so it applies\n * correctly when matching against root-relative paths.\n */\nfunction prefixPattern(line: string, prefix: string): string | null {\n\tconst trimmed = line.trim();\n\tif (!trimmed) return null;\n\tif (trimmed.startsWith(\"#\") && !trimmed.startsWith(\"\\\\#\")) return null;\n\n\tlet pattern = line;\n\tlet negated = false;\n\n\tif (pattern.startsWith(\"!\")) {\n\t\tnegated = true;\n\t\tpattern = pattern.slice(1);\n\t} else if (pattern.startsWith(\"\\\\!\")) {\n\t\tpattern = pattern.slice(1);\n\t}\n\n\tconst prefixed = prefix ? `${prefix}${pattern}` : pattern;\n\treturn negated ? `!${prefixed}` : prefixed;\n}\n\n/**\n * Check if a directory component (relative to root) should be unconditionally skipped.\n * Handles both top-level names (\"node_modules\") and nested paths (\".dreb/index\").\n */\nfunction shouldSkipDir(relPath: string): boolean {\n\tconst posix = toPosix(relPath);\n\n\t// Check the directory name itself\n\tconst parts = posix.split(\"/\");\n\tconst name = parts[parts.length - 1];\n\tif (SKIP_DIRS.has(name)) return true;\n\n\t// Check multi-segment skip patterns (e.g. \".dreb/index\")\n\tfor (const skip of SKIP_DIRS) {\n\t\tif (skip.includes(\"/\") && (posix === skip || posix.endsWith(`/${skip}`))) {\n\t\t\treturn true;\n\t\t}\n\t}\n\n\treturn false;\n}\n\n/** Recursively walk a directory, collecting indexable files. */\nfunction walkDirectory(dir: string, root: string, ig: IgnoreMatcher, results: ScannedFile[]): void {\n\tlet entries: string[];\n\ttry {\n\t\tentries = readdirSync(dir);\n\t} catch {\n\t\treturn; // Permission denied, etc.\n\t}\n\n\tfor (const entry of entries) {\n\t\tconst fullPath = join(dir, entry);\n\t\tconst relPath = relative(root, fullPath);\n\t\tconst posixRel = toPosix(relPath);\n\n\t\tlet stats: Stats;\n\t\ttry {\n\t\t\tstats = statSync(fullPath);\n\t\t} catch {\n\t\t\tcontinue; // Broken symlink, etc.\n\t\t}\n\n\t\tif (stats.isDirectory()) {\n\t\t\t// Hard-coded skip list\n\t\t\tif (shouldSkipDir(relPath)) continue;\n\n\t\t\t// .gitignore check (directories need trailing slash)\n\t\t\tif (ig.ignores(`${posixRel}/`)) continue;\n\n\t\t\t// Load nested .gitignore before descending\n\t\t\tloadGitignore(ig, fullPath, root);\n\n\t\t\twalkDirectory(fullPath, root, ig, results);\n\t\t\tcontinue;\n\t\t}\n\n\t\tif (!stats.isFile()) continue;\n\n\t\t// .gitignore check for files\n\t\tif (ig.ignores(posixRel)) continue;\n\n\t\t// Size gate\n\t\tif (stats.size > MAX_FILE_SIZE) continue;\n\t\tif (stats.size === 0) continue;\n\n\t\t// File type detection\n\t\tconst fileType = detectFileType(entry);\n\t\tif (!fileType) continue;\n\n\t\tresults.push({\n\t\t\tfilePath: posixRel,\n\t\t\tfileType,\n\t\t\tmtime: stats.mtimeMs,\n\t\t});\n\t}\n}\n\n/**\n * Scan a memory directory (project or global) for indexable files.\n *\n * Memory directories are always fully included — no .gitignore filtering —\n * because they live outside the normal project tree or in `.dreb/` which\n * is typically gitignored.\n *\n * Paths for global memory files are stored with a `~memory/` prefix\n * to distinguish them from project files.\n */\nfunction scanMemoryDir(memoryDir: string, projectRoot: string, results: ScannedFile[], baseMemoryDir?: string): void {\n\tlet entries: string[];\n\ttry {\n\t\tentries = readdirSync(memoryDir);\n\t} catch {\n\t\treturn;\n\t}\n\n\tfor (const entry of entries) {\n\t\tconst fullPath = join(memoryDir, entry);\n\n\t\tlet stats: Stats;\n\t\ttry {\n\t\t\tstats = statSync(fullPath);\n\t\t} catch {\n\t\t\tcontinue;\n\t\t}\n\n\t\tif (stats.isDirectory()) {\n\t\t\t// Recurse into subdirectories\n\t\t\tscanMemoryDir(fullPath, projectRoot, results, baseMemoryDir ?? memoryDir);\n\t\t\tcontinue;\n\t\t}\n\n\t\tif (!stats.isFile()) continue;\n\t\tif (stats.size > MAX_FILE_SIZE) continue;\n\t\tif (stats.size === 0) continue;\n\n\t\tconst fileType = detectFileType(entry);\n\t\tif (!fileType) continue;\n\n\t\t// If the memory dir is inside the project root, use normal relative path.\n\t\t// Otherwise, use a ~memory/ prefix so paths remain unique and identifiable.\n\t\tconst rel = relative(projectRoot, fullPath);\n\t\tconst isOutsideProject = rel.startsWith(\"..\") || isAbsolute(rel);\n\t\tconst rootMemoryDir = baseMemoryDir ?? memoryDir;\n\t\tconst filePath = isOutsideProject ? `~memory/${relative(rootMemoryDir, fullPath)}` : rel;\n\n\t\tresults.push({\n\t\t\tfilePath: toPosix(filePath),\n\t\t\tfileType,\n\t\t\tmtime: stats.mtimeMs,\n\t\t});\n\t}\n}\n"]}
|
|
1
|
+
{"version":3,"file":"scanner.js","sourceRoot":"","sources":["../../../src/core/search/scanner.ts"],"names":[],"mappings":"AAAA;;;;;GAKG;AAEH,OAAO,EAAE,UAAU,EAAE,WAAW,EAAE,YAAY,EAAc,QAAQ,EAAE,MAAM,SAAS,CAAC;AACtF,OAAO,EAAE,OAAO,EAAE,MAAM,SAAS,CAAC;AAClC,OAAO,EAAE,OAAO,EAAE,UAAU,EAAE,IAAI,EAAE,QAAQ,EAAE,GAAG,EAAE,MAAM,WAAW,CAAC;AACrE,OAAO,MAAM,MAAM,QAAQ,CAAC;AAC5B,OAAO,EAAE,sBAAsB,EAAE,MAAM,wBAAwB,CAAC;AAiBhE,+EAA+E;AAC/E,YAAY;AACZ,+EAA+E;AAE/E,yCAAyC;AACzC,MAAM,aAAa,GAAG,IAAI,GAAG,IAAI,CAAC;AAElC,4DAA4D;AAC5D,MAAM,SAAS,GAAG,IAAI,GAAG,CAAC;IACzB,cAAc;IACd,MAAM;IACN,aAAa;IACb,KAAK;IACL,MAAM;IACN,aAAa;IACb,MAAM;IACN,OAAO;IACP,MAAM;IACN,OAAO;IACP,OAAO;IACP,OAAO;IACP,UAAU;IACV,QAAQ;CACR,CAAC,CAAC;AAEH,sCAAoC;AACpC,MAAM,aAAa,GAAkC,IAAI,GAAG,CAAmB;IAC9E,wBAAwB;IACxB,CAAC,KAAK,EAAE,YAAY,CAAC;IACrB,CAAC,MAAM,EAAE,KAAK,CAAC;IACf,CAAC,KAAK,EAAE,YAAY,CAAC;IACrB,CAAC,MAAM,EAAE,YAAY,CAAC;IACtB,CAAC,MAAM,EAAE,YAAY,CAAC;IACtB,CAAC,KAAK,EAAE,QAAQ,CAAC;IACjB,CAAC,KAAK,EAAE,IAAI,CAAC;IACb,CAAC,KAAK,EAAE,MAAM,CAAC;IACf,CAAC,OAAO,EAAE,MAAM,CAAC;IACjB,CAAC,IAAI,EAAE,GAAG,CAAC;IACX,CAAC,IAAI,EAAE,GAAG,CAAC;IACX,CAAC,MAAM,EAAE,KAAK,CAAC;IACf,CAAC,MAAM,EAAE,KAAK,CAAC;IACf,CAAC,KAAK,EAAE,KAAK,CAAC;IACd,CAAC,MAAM,EAAE,KAAK,CAAC;IACf,CAAC,KAAK,EAAE,KAAK,CAAC;IACd,CAAC,MAAM,EAAE,KAAK,CAAC;IACf,kBAAkB;IAClB,CAAC,KAAK,EAAE,UAAU,CAAC;IACnB,CAAC,MAAM,EAAE,UAAU,CAAC;IACpB,CAAC,MAAM,EAAE,MAAM,CAAC;IAChB,CAAC,OAAO,EAAE,MAAM,CAAC;IACjB,CAAC,OAAO,EAAE,MAAM,CAAC;IACjB,CAAC,OAAO,EAAE,MAAM,CAAC;IACjB,CAAC,MAAM,EAAE,WAAW,CAAC;IACrB,CAAC,MAAM,EAAE,WAAW,CAAC;IACrB,CAAC,MAAM,EAAE,WAAW,CAAC;IACrB,CAAC,MAAM,EAAE,WAAW,CAAC;IACrB,CAAC,OAAO,EAAE,WAAW,CAAC;CACtB,CAAC,CAAC;AAEH,+EAA+E;AAC/E,aAAa;AACb,+EAA+E;AAE/E;;;GAGG;AACH,MAAM,UAAU,cAAc,CAAC,QAAgB,EAAmB;IACjE,MAAM,GAAG,GAAG,OAAO,CAAC,QAAQ,CAAC,CAAC,WAAW,EAAE,CAAC;IAC5C,IAAI,CAAC,GAAG;QAAE,OAAO,IAAI,CAAC;IACtB,OAAO,aAAa,CAAC,GAAG,CAAC,GAAG,CAAC,IAAI,IAAI,CAAC;AAAA,CACtC;AAED;;;;;;GAMG;AACH,MAAM,CAAC,KAAK,UAAU,WAAW,CAAC,WAAmB,EAAE,eAAwB,EAA0B;IACxG,MAAM,OAAO,GAAkB,EAAE,CAAC;IAElC,wEAAsE;IACtE,6EAA6E;IAC7E,MAAM,SAAS,GAAG,aAAa,CAAC,WAAW,CAAC,CAAC;IAE7C,IAAI,SAAS,EAAE,CAAC;QACf,8DAA8D;QAC9D,WAAW,CAAC,WAAW,EAAE,OAAO,CAAC,CAAC;IACnC,CAAC;SAAM,CAAC;QACP,mDAAmD;QACnD,MAAM,EAAE,GAAG,MAAM,EAAE,CAAC;QACpB,aAAa,CAAC,EAAE,EAAE,WAAW,EAAE,WAAW,CAAC,CAAC;QAC5C,aAAa,CAAC,WAAW,EAAE,WAAW,EAAE,EAAE,EAAE,OAAO,CAAC,CAAC;IACtD,CAAC;IAED,4DAA4D;IAC5D,uEAAuE;IACvE,oDAAoD;IACpD,IAAI,CAAC,SAAS,EAAE,CAAC;QAChB,KAAK,MAAM,GAAG,IAAI,sBAAsB,CAAC,WAAW,CAAC,EAAE,CAAC;YACvD,aAAa,CAAC,GAAG,EAAE,WAAW,EAAE,OAAO,CAAC,CAAC;QAC1C,CAAC;IACF,CAAC;IAED,sDAAsD;IACtD,IAAI,eAAe,IAAI,UAAU,CAAC,eAAe,CAAC,EAAE,CAAC;QACpD,aAAa,CAAC,eAAe,EAAE,WAAW,EAAE,OAAO,CAAC,CAAC;IACtD,CAAC;IAED,OAAO,OAAO,CAAC;AAAA,CACf;AAED,oDAAoD;AACpD,SAAS,aAAa,CAAC,GAAW,EAAW;IAC5C,IAAI,CAAC;QACJ,MAAM,IAAI,GAAG,OAAO,EAAE,CAAC;QACvB,4CAA4C;QAC5C,MAAM,aAAa,GAAG,GAAG,CAAC,OAAO,CAAC,SAAS,EAAE,EAAE,CAAC,CAAC;QACjD,MAAM,cAAc,GAAG,IAAI,CAAC,OAAO,CAAC,SAAS,EAAE,EAAE,CAAC,CAAC;QACnD,OAAO,aAAa,KAAK,cAAc,CAAC;IACzC,CAAC;IAAC,MAAM,CAAC;QACR,OAAO,KAAK,CAAC;IACd,CAAC;AAAA,CACD;AAED;;;;GAIG;AACH,SAAS,WAAW,CAAC,GAAW,EAAE,OAAsB,EAAQ;IAC/D,IAAI,OAAiB,CAAC;IACtB,IAAI,CAAC;QACJ,OAAO,GAAG,WAAW,CAAC,GAAG,CAAC,CAAC;IAC5B,CAAC;IAAC,MAAM,CAAC;QACR,OAAO;IACR,CAAC;IAED,KAAK,MAAM,KAAK,IAAI,OAAO,EAAE,CAAC;QAC7B,mEAAmE;QACnE,IAAI,KAAK,CAAC,UAAU,CAAC,GAAG,CAAC;YAAE,SAAS;QAEpC,MAAM,QAAQ,GAAG,IAAI,CAAC,GAAG,EAAE,KAAK,CAAC,CAAC;QAElC,IAAI,KAAY,CAAC;QACjB,IAAI,CAAC;YACJ,KAAK,GAAG,QAAQ,CAAC,QAAQ,CAAC,CAAC;QAC5B,CAAC;QAAC,MAAM,CAAC;YACR,SAAS;QACV,CAAC;QAED,mDAAmD;QACnD,IAAI,CAAC,KAAK,CAAC,MAAM,EAAE;YAAE,SAAS;QAC9B,IAAI,KAAK,CAAC,IAAI,GAAG,aAAa;YAAE,SAAS;QACzC,IAAI,KAAK,CAAC,IAAI,KAAK,CAAC;YAAE,SAAS;QAE/B,MAAM,QAAQ,GAAG,cAAc,CAAC,KAAK,CAAC,CAAC;QACvC,IAAI,CAAC,QAAQ;YAAE,SAAS;QAExB,OAAO,CAAC,IAAI,CAAC;YACZ,QAAQ,EAAE,KAAK;YACf,QAAQ;YACR,KAAK,EAAE,KAAK,CAAC,OAAO;SACpB,CAAC,CAAC;IACJ,CAAC;AAAA,CACD;AAQD,kEAAkE;AAClE,SAAS,OAAO,CAAC,CAAS,EAAU;IACnC,OAAO,CAAC,CAAC,KAAK,CAAC,GAAG,CAAC,CAAC,IAAI,CAAC,GAAG,CAAC,CAAC;AAAA,CAC9B;AAED,sEAAsE;AACtE,SAAS,aAAa,CAAC,EAAiB,EAAE,GAAW,EAAE,IAAY,EAAQ;IAC1E,MAAM,aAAa,GAAG,IAAI,CAAC,GAAG,EAAE,YAAY,CAAC,CAAC;IAC9C,IAAI,CAAC,UAAU,CAAC,aAAa,CAAC;QAAE,OAAO;IAEvC,IAAI,CAAC;QACJ,MAAM,OAAO,GAAG,YAAY,CAAC,aAAa,EAAE,OAAO,CAAC,CAAC;QACrD,MAAM,MAAM,GAAG,QAAQ,CAAC,IAAI,EAAE,GAAG,CAAC,CAAC;QACnC,MAAM,MAAM,GAAG,MAAM,CAAC,CAAC,CAAC,GAAG,OAAO,CAAC,MAAM,CAAC,GAAG,CAAC,CAAC,CAAC,EAAE,CAAC;QAEnD,MAAM,QAAQ,GAAG,OAAO;aACtB,KAAK,CAAC,OAAO,CAAC;aACd,GAAG,CAAC,CAAC,IAAI,EAAE,EAAE,CAAC,aAAa,CAAC,IAAI,EAAE,MAAM,CAAC,CAAC;aAC1C,MAAM,CAAC,CAAC,IAAI,EAAkB,EAAE,CAAC,IAAI,KAAK,IAAI,CAAC,CAAC;QAElD,IAAI,QAAQ,CAAC,MAAM,GAAG,CAAC,EAAE,CAAC;YACzB,EAAE,CAAC,GAAG,CAAC,QAAQ,CAAC,CAAC;QAClB,CAAC;IACF,CAAC;IAAC,MAAM,CAAC;QACR,0CAAwC;IACzC,CAAC;AAAA,CACD;AAED;;;GAGG;AACH,SAAS,aAAa,CAAC,IAAY,EAAE,MAAc,EAAiB;IACnE,MAAM,OAAO,GAAG,IAAI,CAAC,IAAI,EAAE,CAAC;IAC5B,IAAI,CAAC,OAAO;QAAE,OAAO,IAAI,CAAC;IAC1B,IAAI,OAAO,CAAC,UAAU,CAAC,GAAG,CAAC,IAAI,CAAC,OAAO,CAAC,UAAU,CAAC,KAAK,CAAC;QAAE,OAAO,IAAI,CAAC;IAEvE,IAAI,OAAO,GAAG,IAAI,CAAC;IACnB,IAAI,OAAO,GAAG,KAAK,CAAC;IAEpB,IAAI,OAAO,CAAC,UAAU,CAAC,GAAG,CAAC,EAAE,CAAC;QAC7B,OAAO,GAAG,IAAI,CAAC;QACf,OAAO,GAAG,OAAO,CAAC,KAAK,CAAC,CAAC,CAAC,CAAC;IAC5B,CAAC;SAAM,IAAI,OAAO,CAAC,UAAU,CAAC,KAAK,CAAC,EAAE,CAAC;QACtC,OAAO,GAAG,OAAO,CAAC,KAAK,CAAC,CAAC,CAAC,CAAC;IAC5B,CAAC;IAED,MAAM,QAAQ,GAAG,MAAM,CAAC,CAAC,CAAC,GAAG,MAAM,GAAG,OAAO,EAAE,CAAC,CAAC,CAAC,OAAO,CAAC;IAC1D,OAAO,OAAO,CAAC,CAAC,CAAC,IAAI,QAAQ,EAAE,CAAC,CAAC,CAAC,QAAQ,CAAC;AAAA,CAC3C;AAED;;;GAGG;AACH,SAAS,aAAa,CAAC,OAAe,EAAW;IAChD,MAAM,KAAK,GAAG,OAAO,CAAC,OAAO,CAAC,CAAC;IAE/B,kCAAkC;IAClC,MAAM,KAAK,GAAG,KAAK,CAAC,KAAK,CAAC,GAAG,CAAC,CAAC;IAC/B,MAAM,IAAI,GAAG,KAAK,CAAC,KAAK,CAAC,MAAM,GAAG,CAAC,CAAC,CAAC;IACrC,IAAI,SAAS,CAAC,GAAG,CAAC,IAAI,CAAC;QAAE,OAAO,IAAI,CAAC;IAErC,yDAAyD;IACzD,KAAK,MAAM,IAAI,IAAI,SAAS,EAAE,CAAC;QAC9B,IAAI,IAAI,CAAC,QAAQ,CAAC,GAAG,CAAC,IAAI,CAAC,KAAK,KAAK,IAAI,IAAI,KAAK,CAAC,QAAQ,CAAC,IAAI,IAAI,EAAE,CAAC,CAAC,EAAE,CAAC;YAC1E,OAAO,IAAI,CAAC;QACb,CAAC;IACF,CAAC;IAED,OAAO,KAAK,CAAC;AAAA,CACb;AAED,gEAAgE;AAChE,SAAS,aAAa,CAAC,GAAW,EAAE,IAAY,EAAE,EAAiB,EAAE,OAAsB,EAAQ;IAClG,IAAI,OAAiB,CAAC;IACtB,IAAI,CAAC;QACJ,OAAO,GAAG,WAAW,CAAC,GAAG,CAAC,CAAC;IAC5B,CAAC;IAAC,MAAM,CAAC;QACR,OAAO,CAAC,0BAA0B;IACnC,CAAC;IAED,KAAK,MAAM,KAAK,IAAI,OAAO,EAAE,CAAC;QAC7B,MAAM,QAAQ,GAAG,IAAI,CAAC,GAAG,EAAE,KAAK,CAAC,CAAC;QAClC,MAAM,OAAO,GAAG,QAAQ,CAAC,IAAI,EAAE,QAAQ,CAAC,CAAC;QACzC,MAAM,QAAQ,GAAG,OAAO,CAAC,OAAO,CAAC,CAAC;QAElC,IAAI,KAAY,CAAC;QACjB,IAAI,CAAC;YACJ,KAAK,GAAG,QAAQ,CAAC,QAAQ,CAAC,CAAC;QAC5B,CAAC;QAAC,MAAM,CAAC;YACR,SAAS,CAAC,uBAAuB;QAClC,CAAC;QAED,IAAI,KAAK,CAAC,WAAW,EAAE,EAAE,CAAC;YACzB,uBAAuB;YACvB,IAAI,aAAa,CAAC,OAAO,CAAC;gBAAE,SAAS;YAErC,qDAAqD;YACrD,IAAI,EAAE,CAAC,OAAO,CAAC,GAAG,QAAQ,GAAG,CAAC;gBAAE,SAAS;YAEzC,2CAA2C;YAC3C,aAAa,CAAC,EAAE,EAAE,QAAQ,EAAE,IAAI,CAAC,CAAC;YAElC,aAAa,CAAC,QAAQ,EAAE,IAAI,EAAE,EAAE,EAAE,OAAO,CAAC,CAAC;YAC3C,SAAS;QACV,CAAC;QAED,IAAI,CAAC,KAAK,CAAC,MAAM,EAAE;YAAE,SAAS;QAE9B,6BAA6B;QAC7B,IAAI,EAAE,CAAC,OAAO,CAAC,QAAQ,CAAC;YAAE,SAAS;QAEnC,YAAY;QACZ,IAAI,KAAK,CAAC,IAAI,GAAG,aAAa;YAAE,SAAS;QACzC,IAAI,KAAK,CAAC,IAAI,KAAK,CAAC;YAAE,SAAS;QAE/B,sBAAsB;QACtB,MAAM,QAAQ,GAAG,cAAc,CAAC,KAAK,CAAC,CAAC;QACvC,IAAI,CAAC,QAAQ;YAAE,SAAS;QAExB,OAAO,CAAC,IAAI,CAAC;YACZ,QAAQ,EAAE,QAAQ;YAClB,QAAQ;YACR,KAAK,EAAE,KAAK,CAAC,OAAO;SACpB,CAAC,CAAC;IACJ,CAAC;AAAA,CACD;AAED;;;;;;;;;GASG;AACH,SAAS,aAAa,CAAC,SAAiB,EAAE,WAAmB,EAAE,OAAsB,EAAE,aAAsB,EAAQ;IACpH,IAAI,OAAiB,CAAC;IACtB,IAAI,CAAC;QACJ,OAAO,GAAG,WAAW,CAAC,SAAS,CAAC,CAAC;IAClC,CAAC;IAAC,MAAM,CAAC;QACR,OAAO;IACR,CAAC;IAED,KAAK,MAAM,KAAK,IAAI,OAAO,EAAE,CAAC;QAC7B,MAAM,QAAQ,GAAG,IAAI,CAAC,SAAS,EAAE,KAAK,CAAC,CAAC;QAExC,IAAI,KAAY,CAAC;QACjB,IAAI,CAAC;YACJ,KAAK,GAAG,QAAQ,CAAC,QAAQ,CAAC,CAAC;QAC5B,CAAC;QAAC,MAAM,CAAC;YACR,SAAS;QACV,CAAC;QAED,IAAI,KAAK,CAAC,WAAW,EAAE,EAAE,CAAC;YACzB,8BAA8B;YAC9B,aAAa,CAAC,QAAQ,EAAE,WAAW,EAAE,OAAO,EAAE,aAAa,IAAI,SAAS,CAAC,CAAC;YAC1E,SAAS;QACV,CAAC;QAED,IAAI,CAAC,KAAK,CAAC,MAAM,EAAE;YAAE,SAAS;QAC9B,IAAI,KAAK,CAAC,IAAI,GAAG,aAAa;YAAE,SAAS;QACzC,IAAI,KAAK,CAAC,IAAI,KAAK,CAAC;YAAE,SAAS;QAE/B,MAAM,QAAQ,GAAG,cAAc,CAAC,KAAK,CAAC,CAAC;QACvC,IAAI,CAAC,QAAQ;YAAE,SAAS;QAExB,0EAA0E;QAC1E,4EAA4E;QAC5E,MAAM,GAAG,GAAG,QAAQ,CAAC,WAAW,EAAE,QAAQ,CAAC,CAAC;QAC5C,MAAM,gBAAgB,GAAG,GAAG,CAAC,UAAU,CAAC,IAAI,CAAC,IAAI,UAAU,CAAC,GAAG,CAAC,CAAC;QACjE,MAAM,aAAa,GAAG,aAAa,IAAI,SAAS,CAAC;QACjD,MAAM,QAAQ,GAAG,gBAAgB,CAAC,CAAC,CAAC,WAAW,QAAQ,CAAC,aAAa,EAAE,QAAQ,CAAC,EAAE,CAAC,CAAC,CAAC,GAAG,CAAC;QAEzF,OAAO,CAAC,IAAI,CAAC;YACZ,QAAQ,EAAE,OAAO,CAAC,QAAQ,CAAC;YAC3B,QAAQ;YACR,KAAK,EAAE,KAAK,CAAC,OAAO;SACpB,CAAC,CAAC;IACJ,CAAC;AAAA,CACD","sourcesContent":["/**\n * File scanner for the semantic search subsystem.\n *\n * Discovers project files for indexing by walking the directory tree,\n * respecting .gitignore rules, and classifying files by type.\n */\n\nimport { existsSync, readdirSync, readFileSync, type Stats, statSync } from \"node:fs\";\nimport { homedir } from \"node:os\";\nimport { extname, isAbsolute, join, relative, sep } from \"node:path\";\nimport ignore from \"ignore\";\nimport { getDrebToolVisibleDirs } from \"../tools/dreb-paths.js\";\nimport type { FileType } from \"./types.js\";\n\n// ============================================================================\n// Public types\n// ============================================================================\n\n/** A file discovered by the scanner, ready for indexing. */\nexport interface ScannedFile {\n\t/** Path relative to the project root (posix separators). */\n\tfilePath: string;\n\t/** Detected file type. */\n\tfileType: FileType;\n\t/** File modification time in milliseconds since epoch. */\n\tmtime: number;\n}\n\n// ============================================================================\n// Constants\n// ============================================================================\n\n/** Maximum file size to index (1 MB). */\nconst MAX_FILE_SIZE = 1024 * 1024;\n\n/** Directories unconditionally skipped during traversal. */\nconst SKIP_DIRS = new Set([\n\t\"node_modules\",\n\t\".git\",\n\t\".dreb/index\",\n\t\".hg\",\n\t\".svn\",\n\t\"__pycache__\",\n\t\".tox\",\n\t\".venv\",\n\t\"dist\",\n\t\"build\",\n\t\".next\",\n\t\".nuxt\",\n\t\"coverage\",\n\t\".cache\",\n]);\n\n/** Extension → FileType mapping. */\nconst EXTENSION_MAP: ReadonlyMap<string, FileType> = new Map<string, FileType>([\n\t// Tree-sitter languages\n\t[\".ts\", \"typescript\"],\n\t[\".tsx\", \"tsx\"],\n\t[\".js\", \"javascript\"],\n\t[\".mjs\", \"javascript\"],\n\t[\".cjs\", \"javascript\"],\n\t[\".py\", \"python\"],\n\t[\".go\", \"go\"],\n\t[\".rs\", \"rust\"],\n\t[\".java\", \"java\"],\n\t[\".c\", \"c\"],\n\t[\".h\", \"c\"],\n\t[\".cpp\", \"cpp\"],\n\t[\".hpp\", \"cpp\"],\n\t[\".cc\", \"cpp\"],\n\t[\".cxx\", \"cpp\"],\n\t[\".hh\", \"cpp\"],\n\t[\".hxx\", \"cpp\"],\n\t// Text file types\n\t[\".md\", \"markdown\"],\n\t[\".mdx\", \"markdown\"],\n\t[\".yml\", \"yaml\"],\n\t[\".yaml\", \"yaml\"],\n\t[\".json\", \"json\"],\n\t[\".toml\", \"toml\"],\n\t[\".txt\", \"plaintext\"],\n\t[\".cfg\", \"plaintext\"],\n\t[\".ini\", \"plaintext\"],\n\t[\".env\", \"plaintext\"],\n\t[\".conf\", \"plaintext\"],\n]);\n\n// ============================================================================\n// Public API\n// ============================================================================\n\n/**\n * Detect the {@link FileType} for a file path based on its extension.\n * Returns `null` for unrecognized extensions or files without an extension.\n */\nexport function detectFileType(filePath: string): FileType | null {\n\tconst ext = extname(filePath).toLowerCase();\n\tif (!ext) return null;\n\treturn EXTENSION_MAP.get(ext) ?? null;\n}\n\n/**\n * Scan a project directory and return all indexable files.\n *\n * Walks the tree rooted at {@link projectRoot}, respects `.gitignore` rules,\n * skips binary / oversized files, and optionally includes memory files from\n * a global memory directory.\n */\nexport async function scanProject(projectRoot: string, globalMemoryDir?: string): Promise<ScannedFile[]> {\n\tconst results: ScannedFile[] = [];\n\n\t// Detect if projectRoot is the home directory — use shallow scan mode\n\t// to avoid recursing into the entire home dir (which would be catastrophic).\n\tconst isHomeDir = isHomeDirPath(projectRoot);\n\n\tif (isHomeDir) {\n\t\t// Shallow mode: only scan top-level files and ~/.dreb/memory/\n\t\tscanShallow(projectRoot, results);\n\t} else {\n\t\t// Normal mode: full recursive walk with .gitignore\n\t\tconst ig = ignore();\n\t\tloadGitignore(ig, projectRoot, projectRoot);\n\t\twalkDirectory(projectRoot, projectRoot, ig, results);\n\t}\n\n\t// Include tool-visible .dreb/ subdirs (bypasses gitignore).\n\t// In home dir mode, global memory is already handled separately below,\n\t// and we don't want to double-scan ~/.dreb/memory/.\n\tif (!isHomeDir) {\n\t\tfor (const dir of getDrebToolVisibleDirs(projectRoot)) {\n\t\t\tscanMemoryDir(dir, projectRoot, results);\n\t\t}\n\t}\n\n\t// Include global memory files if the directory exists\n\tif (globalMemoryDir && existsSync(globalMemoryDir)) {\n\t\tscanMemoryDir(globalMemoryDir, projectRoot, results);\n\t}\n\n\treturn results;\n}\n\n/** Check if a path is the user's home directory. */\nfunction isHomeDirPath(dir: string): boolean {\n\ttry {\n\t\tconst home = homedir();\n\t\t// Normalize trailing slashes for comparison\n\t\tconst normalizedDir = dir.replace(/[/\\\\]+$/, \"\");\n\t\tconst normalizedHome = home.replace(/[/\\\\]+$/, \"\");\n\t\treturn normalizedDir === normalizedHome;\n\t} catch {\n\t\treturn false;\n\t}\n}\n\n/**\n * Shallow scan mode for home directory: only index top-level files\n * (no directory recursion) to avoid scanning the entire home directory.\n * Memory files are handled separately via scanMemoryDir.\n */\nfunction scanShallow(dir: string, results: ScannedFile[]): void {\n\tlet entries: string[];\n\ttry {\n\t\tentries = readdirSync(dir);\n\t} catch {\n\t\treturn;\n\t}\n\n\tfor (const entry of entries) {\n\t\t// Skip dotfiles/dotdirs in home dir (except specific ones we want)\n\t\tif (entry.startsWith(\".\")) continue;\n\n\t\tconst fullPath = join(dir, entry);\n\n\t\tlet stats: Stats;\n\t\ttry {\n\t\t\tstats = statSync(fullPath);\n\t\t} catch {\n\t\t\tcontinue;\n\t\t}\n\n\t\t// Only index files, not directories (shallow mode)\n\t\tif (!stats.isFile()) continue;\n\t\tif (stats.size > MAX_FILE_SIZE) continue;\n\t\tif (stats.size === 0) continue;\n\n\t\tconst fileType = detectFileType(entry);\n\t\tif (!fileType) continue;\n\n\t\tresults.push({\n\t\t\tfilePath: entry,\n\t\t\tfileType,\n\t\t\tmtime: stats.mtimeMs,\n\t\t});\n\t}\n}\n\n// ============================================================================\n// Internal helpers\n// ============================================================================\n\ntype IgnoreMatcher = ReturnType<typeof ignore>;\n\n/** Convert an OS path to posix separators for ignore matching. */\nfunction toPosix(p: string): string {\n\treturn p.split(sep).join(\"/\");\n}\n\n/** Load .gitignore rules from a directory into the ignore matcher. */\nfunction loadGitignore(ig: IgnoreMatcher, dir: string, root: string): void {\n\tconst gitignorePath = join(dir, \".gitignore\");\n\tif (!existsSync(gitignorePath)) return;\n\n\ttry {\n\t\tconst content = readFileSync(gitignorePath, \"utf-8\");\n\t\tconst relDir = relative(root, dir);\n\t\tconst prefix = relDir ? `${toPosix(relDir)}/` : \"\";\n\n\t\tconst patterns = content\n\t\t\t.split(/\\r?\\n/)\n\t\t\t.map((line) => prefixPattern(line, prefix))\n\t\t\t.filter((line): line is string => line !== null);\n\n\t\tif (patterns.length > 0) {\n\t\t\tig.add(patterns);\n\t\t}\n\t} catch {\n\t\t// Unreadable .gitignore — skip silently\n\t}\n}\n\n/**\n * Prefix a .gitignore pattern with a directory path so it applies\n * correctly when matching against root-relative paths.\n */\nfunction prefixPattern(line: string, prefix: string): string | null {\n\tconst trimmed = line.trim();\n\tif (!trimmed) return null;\n\tif (trimmed.startsWith(\"#\") && !trimmed.startsWith(\"\\\\#\")) return null;\n\n\tlet pattern = line;\n\tlet negated = false;\n\n\tif (pattern.startsWith(\"!\")) {\n\t\tnegated = true;\n\t\tpattern = pattern.slice(1);\n\t} else if (pattern.startsWith(\"\\\\!\")) {\n\t\tpattern = pattern.slice(1);\n\t}\n\n\tconst prefixed = prefix ? `${prefix}${pattern}` : pattern;\n\treturn negated ? `!${prefixed}` : prefixed;\n}\n\n/**\n * Check if a directory component (relative to root) should be unconditionally skipped.\n * Handles both top-level names (\"node_modules\") and nested paths (\".dreb/index\").\n */\nfunction shouldSkipDir(relPath: string): boolean {\n\tconst posix = toPosix(relPath);\n\n\t// Check the directory name itself\n\tconst parts = posix.split(\"/\");\n\tconst name = parts[parts.length - 1];\n\tif (SKIP_DIRS.has(name)) return true;\n\n\t// Check multi-segment skip patterns (e.g. \".dreb/index\")\n\tfor (const skip of SKIP_DIRS) {\n\t\tif (skip.includes(\"/\") && (posix === skip || posix.endsWith(`/${skip}`))) {\n\t\t\treturn true;\n\t\t}\n\t}\n\n\treturn false;\n}\n\n/** Recursively walk a directory, collecting indexable files. */\nfunction walkDirectory(dir: string, root: string, ig: IgnoreMatcher, results: ScannedFile[]): void {\n\tlet entries: string[];\n\ttry {\n\t\tentries = readdirSync(dir);\n\t} catch {\n\t\treturn; // Permission denied, etc.\n\t}\n\n\tfor (const entry of entries) {\n\t\tconst fullPath = join(dir, entry);\n\t\tconst relPath = relative(root, fullPath);\n\t\tconst posixRel = toPosix(relPath);\n\n\t\tlet stats: Stats;\n\t\ttry {\n\t\t\tstats = statSync(fullPath);\n\t\t} catch {\n\t\t\tcontinue; // Broken symlink, etc.\n\t\t}\n\n\t\tif (stats.isDirectory()) {\n\t\t\t// Hard-coded skip list\n\t\t\tif (shouldSkipDir(relPath)) continue;\n\n\t\t\t// .gitignore check (directories need trailing slash)\n\t\t\tif (ig.ignores(`${posixRel}/`)) continue;\n\n\t\t\t// Load nested .gitignore before descending\n\t\t\tloadGitignore(ig, fullPath, root);\n\n\t\t\twalkDirectory(fullPath, root, ig, results);\n\t\t\tcontinue;\n\t\t}\n\n\t\tif (!stats.isFile()) continue;\n\n\t\t// .gitignore check for files\n\t\tif (ig.ignores(posixRel)) continue;\n\n\t\t// Size gate\n\t\tif (stats.size > MAX_FILE_SIZE) continue;\n\t\tif (stats.size === 0) continue;\n\n\t\t// File type detection\n\t\tconst fileType = detectFileType(entry);\n\t\tif (!fileType) continue;\n\n\t\tresults.push({\n\t\t\tfilePath: posixRel,\n\t\t\tfileType,\n\t\t\tmtime: stats.mtimeMs,\n\t\t});\n\t}\n}\n\n/**\n * Scan a memory directory (project or global) for indexable files.\n *\n * Memory directories are always fully included — no .gitignore filtering —\n * because they live outside the normal project tree or in `.dreb/` which\n * is typically gitignored.\n *\n * Paths for global memory files are stored with a `~memory/` prefix\n * to distinguish them from project files.\n */\nfunction scanMemoryDir(memoryDir: string, projectRoot: string, results: ScannedFile[], baseMemoryDir?: string): void {\n\tlet entries: string[];\n\ttry {\n\t\tentries = readdirSync(memoryDir);\n\t} catch {\n\t\treturn;\n\t}\n\n\tfor (const entry of entries) {\n\t\tconst fullPath = join(memoryDir, entry);\n\n\t\tlet stats: Stats;\n\t\ttry {\n\t\t\tstats = statSync(fullPath);\n\t\t} catch {\n\t\t\tcontinue;\n\t\t}\n\n\t\tif (stats.isDirectory()) {\n\t\t\t// Recurse into subdirectories\n\t\t\tscanMemoryDir(fullPath, projectRoot, results, baseMemoryDir ?? memoryDir);\n\t\t\tcontinue;\n\t\t}\n\n\t\tif (!stats.isFile()) continue;\n\t\tif (stats.size > MAX_FILE_SIZE) continue;\n\t\tif (stats.size === 0) continue;\n\n\t\tconst fileType = detectFileType(entry);\n\t\tif (!fileType) continue;\n\n\t\t// If the memory dir is inside the project root, use normal relative path.\n\t\t// Otherwise, use a ~memory/ prefix so paths remain unique and identifiable.\n\t\tconst rel = relative(projectRoot, fullPath);\n\t\tconst isOutsideProject = rel.startsWith(\"..\") || isAbsolute(rel);\n\t\tconst rootMemoryDir = baseMemoryDir ?? memoryDir;\n\t\tconst filePath = isOutsideProject ? `~memory/${relative(rootMemoryDir, fullPath)}` : rel;\n\n\t\tresults.push({\n\t\t\tfilePath: toPosix(filePath),\n\t\t\tfileType,\n\t\t\tmtime: stats.mtimeMs,\n\t\t});\n\t}\n}\n"]}
|
|
@@ -16,7 +16,8 @@ export interface SearchOptions {
|
|
|
16
16
|
export declare class SearchEngine {
|
|
17
17
|
private readonly projectRoot;
|
|
18
18
|
private indexManager;
|
|
19
|
-
private
|
|
19
|
+
private embedderPromise;
|
|
20
|
+
private searchQueue;
|
|
20
21
|
constructor(projectRoot: string);
|
|
21
22
|
/** Check if semantic search is available (requires node:sqlite). */
|
|
22
23
|
static isAvailable(): boolean;
|
|
@@ -32,6 +33,14 @@ export declare class SearchEngine {
|
|
|
32
33
|
files: number;
|
|
33
34
|
chunks: number;
|
|
34
35
|
} | null;
|
|
36
|
+
/**
|
|
37
|
+
* Reset the search index — delete the DB and close the IndexManager.
|
|
38
|
+
*
|
|
39
|
+
* Preserves the embedder (expensive ONNX model, unrelated to index state).
|
|
40
|
+
* The next `search()` call will lazily re-create the IndexManager and build
|
|
41
|
+
* a fresh index from scratch.
|
|
42
|
+
*/
|
|
43
|
+
resetIndex(): void;
|
|
35
44
|
/** Dispose resources. */
|
|
36
45
|
close(): void;
|
|
37
46
|
private getIndexManager;
|