@lojban/semantic-search-mcp 1.0.10 → 1.0.12

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +13 -51
  2. package/package.json +1 -1
  3. package/src/index.ts +86 -212
package/README.md CHANGED
@@ -12,7 +12,7 @@ Use it in **Cursor**, **Claude Code**, or any IDE that supports MCP to search th
12
12
 
13
13
  ## How it works
14
14
 
15
- - **Indexing**: Scans directories for `.txt`, `.md`, `.tsv`, `.csv`, `.json`, `.html`, `.xml`. Each non-empty line gets a vector embedding (via [Hugging Face Transformers.js](https://huggingface.co/docs/transformers.js), model `Xenova/all-MiniLM-L6-v2`) and is stored in a local SQLite database with [@dao-xyz/sqlite3-vec](https://www.npmjs.com/package/@dao-xyz/sqlite3-vec) (SQLite + sqlite-vec for Node and browser).
15
+ - **Indexing**: On startup, if `SEMANTIC_SEARCH_INDEX_DIRS` is set (comma-separated paths), the server scans those directories in the background for `.txt`, `.md`, `.tsv`, `.csv`. Each non-empty line gets a vector embedding (via [Hugging Face Transformers.js](https://huggingface.co/docs/transformers.js), model `Xenova/all-MiniLM-L6-v2`) and is stored in a local SQLite database with [@dao-xyz/sqlite3-vec](https://www.npmjs.com/package/@dao-xyz/sqlite3-vec) (SQLite + sqlite-vec for Node and browser). Indexing runs asynchronously so the server stays responsive and uses bounded memory.
16
16
  - **Search**: You send a natural-language query; the server embeds it and returns the closest lines by cosine similarity.
17
17
  - **Storage**: Index is stored in your project's `.semantic-search/data/` (or set `SEMANTIC_SEARCH_DATA_DIR`). No cloud, no API keys.
18
18
 
@@ -44,22 +44,21 @@ The package is published as [**@lojban/semantic-search-mcp**](https://www.npmjs.
44
44
  }
45
45
  ```
46
46
 
47
- No `cwd` needed: the server stores its index in your **project directory** (`.semantic-search/data/`), so open your project in Cursor and the index is per-workspace. To use a fixed data directory instead, add `"env": { "SEMANTIC_SEARCH_DATA_DIR": "/path/to/data" }`.
47
+ No `cwd` needed: the server stores its index in your **project directory** (`.semantic-search/data/`), so open your project in Cursor and the index is per-workspace. To use a fixed data directory instead, add `"env": { "SEMANTIC_SEARCH_DATA_DIR": "/path/to/data" }`. To have the server index directories on startup, set `"env": { "SEMANTIC_SEARCH_INDEX_DIRS": "./dictionary,./glossary" }` (comma-separated paths).
48
48
 
49
- 2. **Restart Cursor** (or reload the window).
49
+ 2. **Restart Cursor** (or reload the window). If `SEMANTIC_SEARCH_INDEX_DIRS` is set, indexing starts automatically in the background.
50
50
 
51
51
  3. In chat or Composer, ask the AI to use the tools:
52
- - **Index**: "Index the directory `./my-dictionary`" (or a list of paths). Optionally "clear existing index first."
53
52
  - **Search**: "Search the index for …" or "Find entries similar to …"
54
- - **Stats**: "How many lines/files are in the index?"
53
+ - **Stats**: "How many lines/files are in the index?" or "Is indexing still running?" — stats include progress and start time (locale-formatted) when indexing is in progress.
55
54
 
56
- The AI will call `index_directories`, `search`, and `get_index_stats` for you.
55
+ The AI will call `search` and `get_index_stats` for you.
57
56
 
58
57
  ## Use in other AI IDEs (Claude Code, etc.)
59
58
 
60
59
  Any environment that supports MCP over stdio can use this server. Run:
61
60
 
62
- - **One-liner**: `npx -y @lojban/semantic-search-mcp` — dependencies are installed on first run; index is stored in the current working directory's `.semantic-search/data/`. Same tools: `index_directories`, `search`, `get_index_stats`.
61
+ - **One-liner**: `npx -y @lojban/semantic-search-mcp` — dependencies are installed on first run; index is stored in the current working directory's `.semantic-search/data/`. Set env `SEMANTIC_SEARCH_INDEX_DIRS` (comma-separated paths) to index those directories on startup in the background. Tools: `search`, `get_index_stats`.
63
62
 
64
63
  **From source**: Clone the repo, run `npm install` once, then use `"command": "npx", "args": ["tsx", "src/index.ts"], "cwd": "/path/to/semantic-search-mcp"` or `"command": "node", "args": ["/path/to/semantic-search-mcp/run.mjs"]` (no `cwd` needed with the latter). See [MCP_SETUP.md](MCP_SETUP.md) for details.
65
64
 
@@ -67,58 +66,21 @@ Any environment that supports MCP over stdio can use this server. Run:
67
66
 
68
67
  | Tool | Description |
69
68
  |------|-------------|
70
- | `index_directories` | Scan one or more directories and index every line of supported text files. Pass `directories` (array of paths) or set env `SEMANTIC_SEARCH_INDEX_DIRS` (comma-separated). Optional: `clear_existing: true` to replace the index. |
71
69
  | `search` | Semantic search: `query` (string), optional `limit` (default 10). Returns file path, line number, content, and similarity score. |
72
- | `get_index_stats` | Returns total number of indexed files and lines. |
70
+ | `get_index_stats` | Returns total number of indexed files and lines. When indexing is running in the background, also returns progress: `indexing.started_at` (locale-formatted), `lines_indexed_so_far`, `files_indexed_so_far`, and `in_progress`. |
73
71
 
74
- ### Indexing several directories
72
+ ### Indexing on startup
75
73
 
76
- `index_directories` accepts an array of paths. You can index multiple unrelated folders in one go (they are merged into a single index). Use **absolute paths** or paths relative to your project root.
77
-
78
- **In Cursor (natural language):**
79
-
80
- - "Index these directories: `./dictionary`, `./glossary`, and `./notes`."
81
- - "Index `./data/lojban-eng` and `/home/me/other-corpus` with clear_existing true."
82
- - "Clear the index and re-index only `./tsv` and `./exports`."
83
-
84
- **Under the hood** the tool receives:
85
-
86
- ```json
87
- {
88
- "directories": ["./dictionary", "./glossary", "./notes"],
89
- "clear_existing": false
90
- }
91
- ```
92
-
93
- To replace the entire index with new content from several places:
94
-
95
- ```json
96
- {
97
- "directories": ["/path/to/dict1", "/path/to/dict2", "/path/to/corpus"],
98
- "clear_existing": true
99
- }
100
- ```
101
-
102
- Paths can be anywhere on disk (e.g. different drives or projects); the server reads and indexes all supported text/TSV/CSV files under each directory recursively.
103
-
104
- ### Memory and batch size
105
-
106
- Indexing uses **adaptive batch size** based on free system RAM so the OS doesn’t freeze on low-memory machines. The server reads `os.freemem()`, keeps a reserve (default 400MB), and caps batch size between 32 and 512 lines. You can tune this with env vars:
107
-
108
- - **`SEMANTIC_SEARCH_RESERVE_MB`** — MB of RAM to keep free (default `400`).
109
- - **`SEMANTIC_SEARCH_MIN_BATCH`** — minimum lines per batch (default `32`).
110
- - **`SEMANTIC_SEARCH_MAX_BATCH`** — maximum lines per batch (default `512`).
111
-
112
- Example: `SEMANTIC_SEARCH_RESERVE_MB=800 SEMANTIC_SEARCH_MAX_BATCH=256` to leave more headroom and use smaller batches.
74
+ Set the environment variable **`SEMANTIC_SEARCH_INDEX_DIRS`** to a comma-separated list of directories to index. When the MCP server starts, it begins indexing those directories in the background (async). The index is cleared and rebuilt each time the server starts. Use absolute paths or paths relative to the server's working directory. The server reads and indexes all supported text/TSV/CSV files under each directory recursively. Indexing uses bounded memory and yields to the event loop so the OS stays responsive.
113
75
 
114
76
  ## Example: Lojban dictionary gaps
115
77
 
116
- 1. Put your dictionary TSV (e.g. `jbo-eng.tsv`) in a folder.
117
- 2. In Cursor: "Index the directory `./dictionary` with clear_existing true."
118
- 3. Then: "Search for entries similar to 'to cause to become warm' and limit 20."
78
+ 1. Put your dictionary TSV (e.g. `jbo-eng.tsv`) in a folder (e.g. `./dictionary`).
79
+ 2. Set `SEMANTIC_SEARCH_INDEX_DIRS=./dictionary` in your MCP config (or in the environment). Restart the server; indexing runs in the background.
80
+ 3. In Cursor: "Search for entries similar to 'to cause to become warm' and limit 20."
119
81
  4. Or: "Search for 'emotional state of joy' and show me what we have; then suggest word combinations the dictionary might be missing."
120
82
 
121
- The index is stored in `.semantic-search/data/vectors.db` (or your project root). Re-index when you add or change files.
83
+ The index is stored in `.semantic-search/data/vectors.db` (or your project root). Restart the server to re-index when you add or change files.
122
84
 
123
85
  ## Development
124
86
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@lojban/semantic-search-mcp",
3
- "version": "1.0.10",
3
+ "version": "1.0.12",
4
4
  "description": "Local-first MCP server for semantic search using transformers.js and SQLite",
5
5
  "type": "module",
6
6
  "scripts": {
package/src/index.ts CHANGED
@@ -5,10 +5,9 @@ import {
5
5
  CallToolRequestSchema,
6
6
  ListToolsRequestSchema,
7
7
  } from '@modelcontextprotocol/sdk/types.js';
8
- import os from 'node:os';
9
8
  import path from 'path';
10
9
  import { getEmbedding, getBatchEmbeddings } from './embeddings.js';
11
- import { createVectorStorage, type SearchResult, type VectorStorage } from './storage.js';
10
+ import { createVectorStorage, type SearchResult } from './storage.js';
12
11
  import { scanDirectories } from './scanner.js';
13
12
 
14
13
  // Data dir: use env, or project cwd so each workspace has its own index when run via npx from Cursor
@@ -17,185 +16,78 @@ const dataDir =
17
16
  path.join(process.cwd(), '.semantic-search', 'data');
18
17
  const DB_PATH = path.join(dataDir, 'vectors.db');
19
18
 
20
- type IndexStatus = {
21
- isIndexing: boolean;
22
- startedAt: number | null;
23
- finishedAt: number | null;
24
- lastError: string | null;
25
- indexedLines: number;
26
- indexedFiles: number;
27
- directories: string[];
19
+ // Background indexing state (progress for get_index_stats)
20
+ const indexingState = {
21
+ inProgress: false,
22
+ startedAt: null as Date | null,
23
+ linesIndexed: 0,
24
+ filesIndexed: 0,
25
+ error: null as string | null,
28
26
  };
29
27
 
30
- const indexStatus: IndexStatus = {
31
- isIndexing: false,
32
- startedAt: null,
33
- finishedAt: null,
34
- lastError: null,
35
- indexedLines: 0,
36
- indexedFiles: 0,
37
- directories: [],
38
- };
39
-
40
- // Single "mutex": only one indexing job is allowed to run. Starting a new job aborts the previous one.
41
- let currentIndexingAbortController: AbortController | null = null;
42
- let currentJobId = 0;
43
-
44
- // Adaptive batch size: reserve RAM so we don't freeze the OS (env overrides in bytes or MB)
45
- const RESERVE_MB = Number(process.env.SEMANTIC_SEARCH_RESERVE_MB) || 400;
46
- const RESERVE_BYTES = RESERVE_MB * 1024 * 1024;
47
- const MIN_BATCH = Number(process.env.SEMANTIC_SEARCH_MIN_BATCH) || 32;
48
- const MAX_BATCH = Number(process.env.SEMANTIC_SEARCH_MAX_BATCH) || 512;
49
-
50
- /** Rough bytes per indexed line in memory: line text + path + embedding (384 floats) + overhead */
51
- const BYTES_PER_LINE_ESTIMATE = 4000;
52
-
53
- /**
54
- * Compute batch size from current free system RAM. Keeps reserve free to avoid freezing the OS.
55
- */
56
- function getAdaptiveBatchSize(): number {
57
- const free = os.freemem();
58
- const available = free > RESERVE_BYTES ? free - RESERVE_BYTES : Math.floor(free / 2);
59
- const batch = Math.floor(available / BYTES_PER_LINE_ESTIMATE);
60
- const clamped = Math.max(MIN_BATCH, Math.min(MAX_BATCH, batch));
61
- return clamped;
62
- }
63
-
64
- /**
65
- * Request indexing of directories. If another indexing job is running, it is aborted first.
66
- * Then a new job is started (clears index and rebuilds).
67
- */
68
- function requestIndexing(storage: VectorStorage, directories: string[]): void {
69
- if (!directories.length) {
70
- console.error('No directories to index. Set SEMANTIC_SEARCH_INDEX_DIRS (comma-separated paths).');
71
- return;
72
- }
28
+ // Batch size kept small to avoid high RAM usage during indexing
29
+ const INDEX_BATCH_SIZE = 256;
73
30
 
74
- // Abort any in-progress indexing so it doesn't conflict or flush this job's work.
75
- if (currentIndexingAbortController) {
76
- currentIndexingAbortController.abort();
77
- currentIndexingAbortController = null;
78
- }
79
-
80
- currentJobId += 1;
81
- const jobId = currentJobId;
82
- currentIndexingAbortController = new AbortController();
83
- const signal = currentIndexingAbortController.signal;
84
-
85
- indexStatus.isIndexing = true;
86
- indexStatus.startedAt = Date.now();
87
- indexStatus.finishedAt = null;
88
- indexStatus.lastError = null;
89
- indexStatus.directories = directories;
90
- indexStatus.indexedLines = 0;
91
- indexStatus.indexedFiles = 0;
92
-
93
- void startIndexing(storage, directories, signal, jobId);
94
- }
95
-
96
- async function startIndexing(
97
- storage: VectorStorage,
98
- directories: string[],
99
- signal: AbortSignal,
100
- jobId: number
31
+ async function runBackgroundIndexing(
32
+ storage: Awaited<ReturnType<typeof createVectorStorage>>,
33
+ directories: string[]
101
34
  ): Promise<void> {
102
- const isCurrentJob = (): boolean => currentJobId === jobId;
35
+ indexingState.inProgress = true;
36
+ indexingState.startedAt = new Date();
37
+ indexingState.linesIndexed = 0;
38
+ indexingState.filesIndexed = 0;
39
+ indexingState.error = null;
40
+ storage.clear();
103
41
 
104
42
  try {
105
- if (signal.aborted) return;
106
-
107
- storage.clear();
108
- console.error(`Scanning ${directories.length} directories (background indexing)...`);
109
-
110
- let indexedCount = 0;
111
- // eslint-disable-next-line @typescript-eslint/no-explicit-any
112
- let currentBatch: any[] = [];
113
-
114
- const processBatch = async (batchToProcess: any[]) => {
115
- if (batchToProcess.length === 0) return;
116
- const contents = batchToProcess.map((l) => l.content);
43
+ let currentBatch: Array<{ filePath: string; lineNumber: number; content: string }> = [];
44
+ let processingPromise: Promise<void> | null = null;
45
+ const seenFiles = new Set<string>();
46
+
47
+ const processBatch = async (
48
+ batch: Array<{ filePath: string; lineNumber: number; content: string }>
49
+ ): Promise<void> => {
50
+ if (batch.length === 0) return;
51
+ const contents = batch.map((l) => l.content);
117
52
  const embeddings = await getBatchEmbeddings(contents);
118
-
119
- const batchData = batchToProcess.map((line, idx) => ({
53
+ const batchData = batch.map((line, idx) => ({
120
54
  filePath: line.filePath,
121
55
  lineNumber: line.lineNumber,
122
56
  content: line.content,
123
57
  embedding: embeddings[idx],
124
58
  }));
125
-
126
59
  await storage.upsertLinesBatch(batchData);
127
- indexedCount += batchToProcess.length;
128
- if (isCurrentJob()) indexStatus.indexedLines = indexedCount;
129
- console.error(`Indexed ${indexedCount} lines...`);
60
+ for (const l of batch) seenFiles.add(l.filePath);
61
+ indexingState.linesIndexed += batch.length;
62
+ indexingState.filesIndexed = seenFiles.size;
130
63
  };
131
64
 
132
- // Single task queue: only one batch is processed at a time (no pipelining).
133
- // We do not read the next batch until the current one is fully done, to avoid memory spikes and OS freezes.
134
- let batchSize = getAdaptiveBatchSize();
135
- console.error(`Adaptive batch size: ${batchSize} (free RAM: ${Math.round(os.freemem() / 1024 / 1024)}MB, reserve: ${RESERVE_MB}MB)`);
65
+ const yieldToEventLoop = (): Promise<void> =>
66
+ new Promise((resolve) => setImmediate(resolve));
136
67
 
137
68
  for await (const line of scanDirectories(directories)) {
138
- if (signal.aborted) break;
139
-
140
69
  currentBatch.push(line);
141
- if (currentBatch.length >= batchSize) {
70
+ if (currentBatch.length >= INDEX_BATCH_SIZE) {
71
+ if (processingPromise) await processingPromise;
142
72
  const batchToProcess = currentBatch;
143
73
  currentBatch = [];
144
- batchSize = getAdaptiveBatchSize();
145
-
146
- await processBatch(batchToProcess);
147
- if (signal.aborted) break;
74
+ processingPromise = processBatch(batchToProcess);
75
+ await yieldToEventLoop();
148
76
  }
149
77
  }
150
78
 
151
- if (signal.aborted) {
152
- console.error('Indexing aborted (new job started or cancelled).');
153
- return;
154
- }
155
-
156
- if (currentBatch.length > 0) {
157
- await processBatch(currentBatch);
158
- }
159
-
160
- if (!isCurrentJob()) return;
79
+ if (processingPromise) await processingPromise;
80
+ if (currentBatch.length > 0) await processBatch(currentBatch);
161
81
 
162
82
  const stats = await storage.getStats();
163
- indexStatus.indexedFiles = stats.totalFiles;
164
- indexStatus.indexedLines = stats.totalLines;
165
- indexStatus.finishedAt = Date.now();
166
-
167
- console.error(
168
- `Finished indexing ${stats.totalLines} lines from ${stats.totalFiles} files in background job.`
169
- );
83
+ indexingState.linesIndexed = stats.totalLines;
84
+ indexingState.filesIndexed = stats.totalFiles;
170
85
  } catch (err) {
171
- const message = err instanceof Error ? err.message : String(err);
172
- if (isCurrentJob()) {
173
- indexStatus.lastError = message;
174
- indexStatus.finishedAt = Date.now();
175
- }
176
- console.error('Error during indexing job:', err);
86
+ indexingState.error = err instanceof Error ? err.message : String(err);
87
+ console.error('Background indexing error:', indexingState.error);
177
88
  } finally {
178
- if (isCurrentJob()) {
179
- indexStatus.isIndexing = false;
180
- }
181
- if (currentIndexingAbortController && currentJobId === jobId) {
182
- currentIndexingAbortController = null;
183
- }
184
- }
185
- }
186
-
187
- function ensureInitialIndexing(storage: VectorStorage): void {
188
- const envDirs = process.env.SEMANTIC_SEARCH_INDEX_DIRS;
189
- const directories = envDirs ? envDirs.split(',').map((d) => d.trim()).filter(Boolean) : [];
190
-
191
- if (!directories.length) {
192
- console.error(
193
- 'Semantic Search MCP: SEMANTIC_SEARCH_INDEX_DIRS is not set; automatic indexing on startup is disabled.'
194
- );
195
- return;
89
+ indexingState.inProgress = false;
196
90
  }
197
-
198
- requestIndexing(storage, directories);
199
91
  }
200
92
 
201
93
  async function main() {
@@ -216,19 +108,9 @@ async function main() {
216
108
  server.setRequestHandler(ListToolsRequestSchema, async () => {
217
109
  return {
218
110
  tools: [
219
- {
220
- name: 'index_directories',
221
- description:
222
- 'Trigger background indexing of directories from SEMANTIC_SEARCH_INDEX_DIRS (comma-separated). Clears and rebuilds the index asynchronously.',
223
- inputSchema: {
224
- type: 'object',
225
- properties: {},
226
- },
227
- },
228
111
  {
229
112
  name: 'search',
230
- description:
231
- 'Search for lines semantically similar to the query. Returns the most relevant lines from indexed files.',
113
+ description: 'Search for lines semantically similar to the query. Returns the most relevant lines from indexed files.',
232
114
  inputSchema: {
233
115
  type: 'object',
234
116
  properties: {
@@ -247,7 +129,7 @@ async function main() {
247
129
  },
248
130
  {
249
131
  name: 'get_index_stats',
250
- description: 'Get statistics and progress for the current index (files, lines, progress state)',
132
+ description: 'Get statistics about the current index (number of files and lines indexed). If indexing is running in the background, returns progress and start time (locale-formatted).',
251
133
  inputSchema: {
252
134
  type: 'object',
253
135
  properties: {},
@@ -262,40 +144,6 @@ async function main() {
262
144
 
263
145
  try {
264
146
  switch (name) {
265
- case 'index_directories': {
266
- const envDirs = process.env.SEMANTIC_SEARCH_INDEX_DIRS;
267
- const directories = envDirs ? envDirs.split(',').map((d) => d.trim()).filter(Boolean) : [];
268
- if (!directories.length) {
269
- throw new Error(
270
- 'No directories to index. Set SEMANTIC_SEARCH_INDEX_DIRS (comma-separated paths).'
271
- );
272
- }
273
-
274
- // Abort any in-progress indexing and start a new job (clears and rebuilds).
275
- requestIndexing(storage, directories);
276
-
277
- const stats = await storage.getStats();
278
- return {
279
- content: [
280
- {
281
- type: 'text',
282
- text: JSON.stringify({
283
- success: true,
284
- indexing: indexStatus.isIndexing,
285
- indexed_lines: stats.totalLines,
286
- indexed_files: stats.totalFiles,
287
- started_at: indexStatus.startedAt,
288
- finished_at: indexStatus.finishedAt,
289
- last_error: indexStatus.lastError,
290
- message: indexStatus.isIndexing
291
- ? `Indexing started in background. Currently ${stats.totalLines} lines from ${stats.totalFiles} files in index.`
292
- : `Indexing completed. Indexed ${stats.totalLines} lines from ${stats.totalFiles} files.`,
293
- }),
294
- },
295
- ],
296
- };
297
- }
298
-
299
147
  case 'search': {
300
148
  const query = (args as { query: string; limit?: number }).query;
301
149
  const limit = (args as { query: string; limit?: number }).limit ?? 10;
@@ -323,21 +171,41 @@ async function main() {
323
171
 
324
172
  case 'get_index_stats': {
325
173
  const stats = await storage.getStats();
174
+ const payload: {
175
+ total_files: number;
176
+ total_lines: number;
177
+ indexing?: {
178
+ in_progress: boolean;
179
+ started_at: string;
180
+ lines_indexed_so_far: number;
181
+ files_indexed_so_far: number;
182
+ error?: string;
183
+ };
184
+ } = {
185
+ total_files: stats.totalFiles,
186
+ total_lines: stats.totalLines,
187
+ };
188
+ if (indexingState.inProgress && indexingState.startedAt) {
189
+ payload.indexing = {
190
+ in_progress: true,
191
+ started_at: indexingState.startedAt.toLocaleString(),
192
+ lines_indexed_so_far: indexingState.linesIndexed,
193
+ files_indexed_so_far: indexingState.filesIndexed,
194
+ };
195
+ } else if (indexingState.error) {
196
+ payload.indexing = {
197
+ in_progress: false,
198
+ started_at: indexingState.startedAt?.toLocaleString() ?? '',
199
+ lines_indexed_so_far: indexingState.linesIndexed,
200
+ files_indexed_so_far: indexingState.filesIndexed,
201
+ error: indexingState.error,
202
+ };
203
+ }
326
204
  return {
327
205
  content: [
328
206
  {
329
207
  type: 'text',
330
- text: JSON.stringify({
331
- total_files: stats.totalFiles,
332
- total_lines: stats.totalLines,
333
- is_indexing: indexStatus.isIndexing,
334
- indexed_lines: indexStatus.indexedLines,
335
- indexed_files: indexStatus.indexedFiles,
336
- started_at: indexStatus.startedAt,
337
- finished_at: indexStatus.finishedAt,
338
- last_error: indexStatus.lastError,
339
- directories: indexStatus.directories,
340
- }),
208
+ text: JSON.stringify(payload),
341
209
  },
342
210
  ],
343
211
  };
@@ -359,8 +227,14 @@ async function main() {
359
227
  await server.connect(transport);
360
228
  console.error('Semantic Search MCP Server running on stdio');
361
229
 
362
- // Kick off initial background indexing when the MCP server is enabled.
363
- ensureInitialIndexing(storage);
230
+ const envDirs = process.env.SEMANTIC_SEARCH_INDEX_DIRS;
231
+ const directories = envDirs ? envDirs.split(',').map((d) => d.trim()).filter(Boolean) : [];
232
+ if (directories.length > 0) {
233
+ console.error(`Starting background indexing for ${directories.length} directories...`);
234
+ runBackgroundIndexing(storage, directories).catch((err) => {
235
+ console.error('Background indexing failed:', err);
236
+ });
237
+ }
364
238
  }
365
239
 
366
240
  main().catch(console.error);