searchsocket 0.5.0 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,34 +1,44 @@
1
1
  # SearchSocket
2
2
 
3
- Semantic site search and MCP retrieval for SvelteKit content projects.
3
+ Semantic site search and MCP retrieval for SvelteKit content projects. Index your site, search it from the browser or AI tools, and scroll users to the exact content they're looking for.
4
4
 
5
- **Requirements**: Node.js >= 20
5
+ **Requirements**: Node.js >= 20 | **Backend**: [Upstash Vector](https://upstash.com/docs/vector/overall/getstarted) | **License**: MIT
6
+
7
+ ## How it works
8
+
9
+ ```
10
+ SvelteKit Pages → Extractor (Cheerio + Turndown) → Chunker → Upstash Vector
11
+
12
+ Search UI ← SvelteKit API Hook ← Search Engine + Ranking
13
+
14
+ MCP Endpoint → Claude Code / Claude Desktop
15
+ ```
16
+
17
+ SearchSocket extracts content from your SvelteKit site, converts it to markdown, splits it into chunks, and stores them in Upstash Vector. At runtime, the SvelteKit hook serves both a search API for your frontend and an MCP endpoint for AI tools.
6
18
 
7
19
  ## Features
8
20
 
9
- - **Embeddings**: Jina AI `jina-embeddings-v5-text-small` with task-specific LoRA adapters (configurable)
10
- - **Vector Backend**: Turso/libSQL with vector search (local file DB for development, remote for production)
11
- - **Rerank**: Jina `jina-reranker-v3` enabled by default same API key
12
- - **Page Aggregation**: Group results by page with score-weighted chunk decay
13
- - **Meta Extraction**: Automatically extracts `<meta name="description">` and `<meta name="keywords">` for improved relevance
14
- - **SvelteKit Integrations**:
15
- - `searchsocketHandle()` for `POST /api/search` endpoint
16
- - `searchsocketVitePlugin()` for build-triggered indexing
17
- - **Client Library**: `createSearchClient()` for browser-side search, `buildResultUrl()` for scroll-to-section links
18
- - **Scroll-to-Text**: `searchsocketScrollToText()` auto-scrolls to matching sections on navigation
19
- - **MCP Server**: Model Context Protocol tools for search and page retrieval
21
+ - **Semantic + keyword search** — Upstash Vector handles hybrid search with built-in reranking and input enrichment
22
+ - **Dual search** parallel page-level and chunk-level queries with configurable score blending
23
+ - **Scroll-to-text** auto-scroll to the matching section when a user clicks a search result, with CSS Highlight API and Text Fragment support
24
+ - **SvelteKit integration** server hook for the search API, Vite plugin for build-triggered indexing
25
+ - **Svelte 5 components** reactive `createSearch` store and `<SearchSocket>` metadata component
26
+ - **MCP server** — six tools for Claude Code, Claude Desktop, and other MCP clients (stdio + HTTP)
27
+ - **llms.txt generation** auto-generate LLM-friendly site indexes during indexing
28
+ - **Four source modes** — index from static output, build manifest, a running server, or raw markdown files
29
+ - **CLI** init, index, search, dev, status, doctor, clean, prune, test, mcp, add
20
30
 
21
31
  ## Install
22
32
 
23
33
  ```bash
24
- # pnpm
25
34
  pnpm add -D searchsocket
26
-
27
- # npm
28
- npm install -D searchsocket
29
35
  ```
30
36
 
31
- SearchSocket is typically a dev dependency for CLI indexing. If you use `searchsocketHandle()` at runtime (e.g., in a Node server adapter), add it as a regular dependency instead.
37
+ SearchSocket is typically a dev dependency since indexing runs at build time. If you use `searchsocketHandle()` at runtime (e.g., in a Node server adapter or serving the MCP endpoint from a production deployment), add it as a regular dependency:
38
+
39
+ ```bash
40
+ pnpm add searchsocket
41
+ ```
32
42
 
33
43
  ## Quickstart
34
44
 
@@ -38,100 +48,134 @@ SearchSocket is typically a dev dependency for CLI indexing. If you use `searchs
38
48
  pnpm searchsocket init
39
49
  ```
40
50
 
41
- This creates:
42
- - `searchsocket.config.ts` — minimal config file
43
- - `.searchsocket/` — state directory (added to `.gitignore`)
51
+ Creates `searchsocket.config.ts`, the `.searchsocket/` state directory, wires up your SvelteKit hooks and Vite config, and generates `.mcp.json` for Claude Code.
44
52
 
45
53
  ### 2. Configure
46
54
 
47
55
  Minimal config (`searchsocket.config.ts`):
48
56
 
49
57
  ```ts
50
- export default {
51
- embeddings: { apiKeyEnv: "JINA_API_KEY" }
52
- };
58
+ export default {};
53
59
  ```
54
60
 
55
- **That's it!** Turso defaults work out of the box:
56
- - **Development**: Uses local file DB at `.searchsocket/vectors.db`
57
- - **Production**: Set `TURSO_DATABASE_URL` and `TURSO_AUTH_TOKEN` to use remote Turso
61
+ That's it defaults handle the rest. SearchSocket reads `UPSTASH_VECTOR_REST_URL` and `UPSTASH_VECTOR_REST_TOKEN` from your environment automatically.
58
62
 
59
- ### 3. Add SvelteKit API Hook
63
+ ### 3. Set environment variables
60
64
 
61
- Create or update `src/hooks.server.ts`:
65
+ ```bash
66
+ # .env
67
+ UPSTASH_VECTOR_REST_URL=https://...
68
+ UPSTASH_VECTOR_REST_TOKEN=...
69
+ ```
70
+
71
+ Create an [Upstash Vector index](https://console.upstash.com/vector) with the `bge-large-en-v1.5` embedding model (1024 dimensions). Copy the REST URL and token.
72
+
73
+ ### 4. Add the SvelteKit hook
74
+
75
+ The `init` command does this for you, but if you need to do it manually:
62
76
 
63
77
  ```ts
78
+ // src/hooks.server.ts
64
79
  import { searchsocketHandle } from "searchsocket/sveltekit";
65
80
 
66
81
  export const handle = searchsocketHandle();
67
82
  ```
68
83
 
69
- This exposes `POST /api/search` with automatic scope resolution.
84
+ This exposes `POST /api/search`, `GET /api/search/health`, the MCP endpoint at `/api/mcp`, and page retrieval routes.
70
85
 
71
- ### 4. Set Environment Variables
86
+ If you run into SSR bundling issues, mark SearchSocket as external in your Vite config:
72
87
 
73
- The CLI automatically loads `.env` from the working directory on startup, so your existing `.env` file works out of the box — no wrapper scripts or shell exports needed.
88
+ ```ts
89
+ // vite.config.ts
90
+ export default defineConfig({
91
+ plugins: [sveltekit()],
92
+ ssr: {
93
+ external: ["searchsocket", "searchsocket/sveltekit", "searchsocket/client"]
94
+ }
95
+ });
96
+ ```
97
+
98
+ ### 5. Add search to your frontend
99
+
100
+ Copy the search dialog template into your project:
74
101
 
75
- Development (`.env`):
76
102
  ```bash
77
- JINA_API_KEY=jina_...
103
+ pnpm searchsocket add search-dialog
78
104
  ```
79
105
 
80
- Production (add these for remote Turso):
81
- ```bash
82
- JINA_API_KEY=jina_...
83
- TURSO_DATABASE_URL=libsql://your-db.turso.io
84
- TURSO_AUTH_TOKEN=eyJ...
106
+ This copies a Svelte 5 component to `src/lib/components/search/SearchDialog.svelte` with Cmd+K built in. Import it in your layout and add the scroll-to-text handler:
107
+
108
+ ```svelte
109
+ <!-- src/routes/+layout.svelte -->
110
+ <script>
111
+ import { afterNavigate } from "$app/navigation";
112
+ import { searchsocketScrollToText } from "searchsocket/sveltekit";
113
+ import SearchDialog from "$lib/components/search/SearchDialog.svelte";
114
+
115
+ afterNavigate(searchsocketScrollToText);
116
+ </script>
117
+
118
+ <SearchDialog />
119
+
120
+ <slot />
85
121
  ```
86
122
 
87
- ### 5. Index Your Content
123
+ Users can now press Cmd+K to search. See [Building a Search UI](docs/search-ui.md) for scoped search, custom styling, and more patterns.
124
+
125
+ ### 6. Deploy
126
+
127
+ SearchSocket is designed to index automatically on deploy. The `init` command already added the Vite plugin to your config. Set these environment variables on your hosting platform (Vercel, Cloudflare, etc.):
128
+
129
+ | Variable | Value |
130
+ |----------|-------|
131
+ | `UPSTASH_VECTOR_REST_URL` | Your Upstash Vector REST URL |
132
+ | `UPSTASH_VECTOR_REST_TOKEN` | Your Upstash Vector REST token |
133
+ | `SEARCHSOCKET_AUTO_INDEX` | `1` |
134
+
135
+ Every deploy will build your site, index the content, and serve the search API — fully automated.
136
+
137
+ For local testing, you can also build and index manually:
88
138
 
89
139
  ```bash
90
- pnpm searchsocket index --changed-only
140
+ pnpm build
141
+ pnpm searchsocket index
91
142
  ```
92
143
 
93
- SearchSocket auto-detects the source mode based on your config:
94
- - **`static-output`** (default): Reads prerendered HTML from `build/`
95
- - **`build`**: Discovers routes from SvelteKit build manifest and renders via preview server
96
- - **`crawl`**: Fetches pages from a running HTTP server
97
- - **`content-files`**: Reads markdown/svelte source files directly
144
+ ### 7. Connect Claude Code (optional)
145
+
146
+ Point Claude Code at your deployed site's MCP endpoint:
98
147
 
99
- The indexing pipeline:
100
- - Extracts content from `<main>` (configurable), including `<meta>` description and keywords
101
- - Chunks text with semantic heading boundaries
102
- - Prepends page title to each chunk for embedding context
103
- - Generates a synthetic summary chunk per page for identity matching
104
- - Generates embeddings via Jina AI (with task-specific LoRA adapters for indexing vs search)
105
- - Stores vectors in Turso/libSQL with cosine similarity index
148
+ ```json
149
+ {
150
+ "mcpServers": {
151
+ "searchsocket": {
152
+ "type": "http",
153
+ "url": "https://your-site.com/api/mcp"
154
+ }
155
+ }
156
+ }
157
+ ```
106
158
 
107
- ### 6. Query
159
+ See [MCP Server](#mcp-server) for authentication and other options.
160
+
161
+ ### Querying the API directly
162
+
163
+ The search API is also available via HTTP and CLI:
108
164
 
109
- **Via API:**
110
165
  ```bash
166
+ # cURL
111
167
  curl -X POST http://localhost:5173/api/search \
112
168
  -H "content-type: application/json" \
113
169
  -d '{"q":"getting started","topK":5,"groupBy":"page"}'
114
- ```
115
-
116
- **Via client library:**
117
- ```ts
118
- import { createSearchClient } from "searchsocket/client";
119
170
 
120
- const client = createSearchClient(); // defaults to /api/search
121
- const response = await client.search({
122
- q: "getting started",
123
- topK: 5,
124
- groupBy: "page",
125
- pathPrefix: "/docs"
126
- });
171
+ # CLI
172
+ pnpm searchsocket search --q "getting started" --top-k 5
127
173
  ```
128
174
 
129
- **Via CLI:**
130
- ```bash
131
- pnpm searchsocket search --q "getting started" --top-k 5 --path-prefix /docs
132
- ```
175
+ ### Response format
176
+
177
+ With `groupBy: "page"` (the default):
133
178
 
134
- **Response** (with `groupBy: "page"`, the default):
135
179
  ```json
136
180
  {
137
181
  "q": "getting started",
@@ -161,18 +205,16 @@ pnpm searchsocket search --q "getting started" --top-k 5 --path-prefix /docs
161
205
  }
162
206
  ],
163
207
  "meta": {
164
- "timingsMs": { "embed": 120, "vector": 15, "rerank": 0, "total": 135 },
165
- "usedRerank": false,
166
- "modelId": "jina-embeddings-v5-text-small"
208
+ "timingsMs": { "total": 135 }
167
209
  }
168
210
  }
169
211
  ```
170
212
 
171
- The `chunks` array appears when a page has multiple matching chunks above the `minChunkScoreRatio` threshold. Use `groupBy: "chunk"` for flat per-chunk results without page aggregation.
213
+ The `chunks` array contains matching sections within each page. Use `groupBy: "chunk"` for flat per-chunk results without page aggregation.
172
214
 
173
215
  ## Source Modes
174
216
 
175
- SearchSocket supports four source modes for loading pages to index.
217
+ SearchSocket supports four ways to load your site content for indexing.
176
218
 
177
219
  ### `static-output` (default)
178
220
 
@@ -182,50 +224,37 @@ Reads prerendered HTML files from SvelteKit's build output directory.
182
224
  export default {
183
225
  source: {
184
226
  mode: "static-output",
185
- staticOutputDir: "build"
227
+ staticOutputDir: "build" // default
186
228
  }
187
229
  };
188
230
  ```
189
231
 
190
- Best for: Sites with fully prerendered pages. Run `vite build` first, then index.
232
+ Best for fully prerendered sites. Run `vite build` first, then `searchsocket index`.
191
233
 
192
234
  ### `build`
193
235
 
194
- Discovers routes automatically from SvelteKit's build manifest and renders them via an ephemeral `vite preview` server. No manual route configuration needed.
236
+ Discovers routes from SvelteKit's build manifest and renders via an ephemeral `vite preview` server. No manual route lists needed.
195
237
 
196
238
  ```ts
197
239
  export default {
198
240
  source: {
241
+ mode: "build",
199
242
  build: {
200
- outputDir: ".svelte-kit/output", // default
201
- previewTimeout: 30000, // ms to wait for server (default)
202
- exclude: ["/api/*", "/admin/*"], // glob patterns to skip
203
- paramValues: { // values for dynamic routes
243
+ exclude: ["/api/*", "/admin/*"],
244
+ paramValues: {
204
245
  "/blog/[slug]": ["hello-world", "getting-started"],
205
246
  "/docs/[category]/[page]": ["guides/quickstart", "api/search"]
206
247
  },
207
- discover: true, // crawl internal links to find pages (default: false)
208
- seedUrls: ["/"], // starting URLs for discovery
209
- maxPages: 200, // max pages to discover (default: 200)
210
- maxDepth: 5 // max link depth from seed URLs (default: 5)
248
+ discover: true, // crawl internal links to find more pages
249
+ seedUrls: ["/"],
250
+ maxPages: 200,
251
+ maxDepth: 5
211
252
  }
212
253
  }
213
254
  };
214
255
  ```
215
256
 
216
- Best for: CI/CD pipelines. Enables `vite build && searchsocket index` with zero route configuration.
217
-
218
- **How it works**:
219
- 1. Parses `.svelte-kit/output/server/manifest-full.js` to discover all page routes
220
- 2. Expands dynamic routes using `paramValues` (skips dynamic routes without values)
221
- 3. Starts an ephemeral `vite preview` server on a random port
222
- 4. Fetches all routes concurrently for SSR-rendered HTML
223
- 5. Provides exact route-to-file mapping (no heuristic matching needed)
224
- 6. Shuts down the preview server
225
-
226
- **Dynamic routes**: Each key in `paramValues` maps to a route ID (e.g., `/blog/[slug]`) or its URL equivalent. Each value in the array replaces all `[param]` segments in the URL. Routes with layout groups like `/(app)/blog/[slug]` also match the URL key `/blog/[slug]`.
227
-
228
- **Link discovery**: Enable `discover: true` to automatically find pages by crawling internal links from `seedUrls`. This is useful when dynamic routes have many parameter values that are impractical to enumerate. The crawler respects `maxPages` and `maxDepth` limits and only follows links within the same origin.
257
+ Best for CI/CD pipelines: `vite build && searchsocket index` with zero route configuration.
229
258
 
230
259
  ### `crawl`
231
260
 
@@ -234,24 +263,24 @@ Fetches pages from a running HTTP server.
234
263
  ```ts
235
264
  export default {
236
265
  source: {
266
+ mode: "crawl",
237
267
  crawl: {
238
268
  baseUrl: "http://localhost:4173",
239
- routes: ["/", "/docs", "/blog"], // explicit routes
240
- sitemapUrl: "https://example.com/sitemap.xml" // or discover via sitemap
269
+ routes: ["/", "/docs", "/blog"],
270
+ sitemapUrl: "https://example.com/sitemap.xml"
241
271
  }
242
272
  }
243
273
  };
244
274
  ```
245
275
 
246
- If `routes` is omitted and no `sitemapUrl` is set, defaults to crawling `["/"]` only.
247
-
248
276
  ### `content-files`
249
277
 
250
- Reads markdown and svelte source files directly, without building or serving.
278
+ Reads markdown and Svelte source files directly, without building or serving.
251
279
 
252
280
  ```ts
253
281
  export default {
254
282
  source: {
283
+ mode: "content-files",
255
284
  contentFiles: {
256
285
  globs: ["src/routes/**/*.md", "content/**/*.md"],
257
286
  baseDir: "."
@@ -262,65 +291,132 @@ export default {
262
291
 
263
292
  ## Client Library
264
293
 
265
- SearchSocket exports a lightweight client for browser-side search:
294
+ ### `createSearchClient(options?)`
295
+
296
+ Lightweight browser-side search client.
266
297
 
267
298
  ```ts
268
299
  import { createSearchClient } from "searchsocket/client";
269
300
 
270
301
  const client = createSearchClient({
271
- endpoint: "/api/search", // default
272
- fetchImpl: fetch // default; override for SSR or testing
302
+ endpoint: "/api/search", // default
303
+ fetchImpl: fetch // override for SSR or testing
273
304
  });
274
305
 
275
- const response = await client.search({
306
+ const { results } = await client.search({
276
307
  q: "deployment guide",
277
308
  topK: 8,
278
309
  groupBy: "page",
279
310
  pathPrefix: "/docs",
280
311
  tags: ["guide"],
281
- rerank: true
312
+ filters: { version: 2 },
313
+ maxSubResults: 3
282
314
  });
315
+ ```
283
316
 
284
- for (const result of response.results) {
285
- console.log(result.url, result.title, result.score);
286
- if (result.chunks) {
287
- for (const chunk of result.chunks) {
288
- console.log(" ", chunk.sectionTitle, chunk.score);
289
- }
290
- }
291
- }
317
+ ### `buildResultUrl(result)`
318
+
319
+ Builds a URL from a search result that includes scroll-to-text metadata:
320
+
321
+ - `_ssk` query parameter — section title for SvelteKit client-side navigation
322
+ - `_sskt` query parameter — text target snippet for precise scroll
323
+ - `#:~:text=` — [Text Fragment](https://developer.mozilla.org/en-US/docs/Web/URI/Fragment/Text_fragments) for native browser scroll on full page loads
324
+
325
+ ```ts
326
+ import { buildResultUrl } from "searchsocket/client";
327
+
328
+ const href = buildResultUrl(result);
329
+ // "/docs/getting-started?_ssk=Installation&_sskt=Install+with+pnpm#:~:text=Install%20with%20pnpm"
292
330
  ```
293
331
 
294
- ## Scroll-to-Text Navigation
332
+ ## Svelte 5 Integration
295
333
 
296
- When a visitor clicks a search result, SearchSocket can automatically scroll them to the relevant section on the destination page. This uses two utilities:
334
+ ### `createSearch(options?)`
297
335
 
298
- ### `buildResultUrl(result)`
336
+ A reactive search store built on Svelte 5 runes with debouncing and LRU caching.
299
337
 
300
- Builds a URL from a search result that includes:
301
- - A `_ssk` query parameter for SvelteKit client-side navigation (read by `searchsocketScrollToText`)
302
- - A [Text Fragment](https://developer.mozilla.org/en-US/docs/Web/URI/Fragment/Text_fragments) (`#:~:text=`) for native browser scroll-to-text on full page loads (Chrome 80+, Safari 16.1+, Firefox 131+)
338
+ ```svelte
339
+ <script>
340
+ import { createSearch } from "searchsocket/svelte";
341
+ import { buildResultUrl } from "searchsocket/client";
342
+
343
+ const search = createSearch({
344
+ endpoint: "/api/search",
345
+ debounce: 250, // ms (default)
346
+ cache: true, // LRU result caching (default)
347
+ cacheSize: 50, // max cached queries (default)
348
+ topK: 10,
349
+ groupBy: "page",
350
+ pathPrefix: "/docs" // scope search to a section
351
+ });
352
+ </script>
303
353
 
304
- Import from `searchsocket/client`:
354
+ <input bind:value={search.query} placeholder="Search docs..." />
355
+
356
+ {#if search.loading}
357
+ <p>Searching...</p>
358
+ {/if}
359
+
360
+ {#if search.error}
361
+ <p class="error">{search.error.message}</p>
362
+ {/if}
363
+
364
+ {#each search.results as result}
365
+ <a href={buildResultUrl(result)}>
366
+ <strong>{result.title}</strong>
367
+ {#if result.sectionTitle}
368
+ <span>— {result.sectionTitle}</span>
369
+ {/if}
370
+ </a>
371
+ <p>{result.snippet}</p>
372
+ {/each}
373
+ ```
305
374
 
306
- ```ts
307
- import { createSearchClient, buildResultUrl } from "searchsocket/client";
375
+ Call `search.destroy()` to clean up when no longer needed (automatic in component context).
308
376
 
309
- const client = createSearchClient();
310
- const { results } = await client.search({ q: "installation" });
377
+ ### `<SearchSocket>` component
311
378
 
312
- // Use in your search UI
313
- for (const result of results) {
314
- const href = buildResultUrl(result);
315
- // "/docs/getting-started?_ssk=Installation#:~:text=Installation"
316
- }
379
+ Declarative meta tag component for controlling per-page search behavior:
380
+
381
+ ```svelte
382
+ <script>
383
+ import { SearchSocket } from "searchsocket/svelte";
384
+ </script>
385
+
386
+ <!-- Boost this page's search ranking -->
387
+ <SearchSocket weight={1.2} />
388
+
389
+ <!-- Exclude from search -->
390
+ <SearchSocket noindex />
391
+
392
+ <!-- Add filterable tags -->
393
+ <SearchSocket tags={["guide", "advanced"]} />
394
+
395
+ <!-- Add structured metadata (filterable via search API) -->
396
+ <SearchSocket meta={{ version: 2, category: "api" }} />
397
+ ```
398
+
399
+ The component renders `<meta>` tags in `<svelte:head>` that SearchSocket reads during indexing.
400
+
401
+ ### Template components
402
+
403
+ Copy ready-made search UI components into your project:
404
+
405
+ ```bash
406
+ pnpm searchsocket add search-dialog
407
+ pnpm searchsocket add search-input
408
+ pnpm searchsocket add search-results
317
409
  ```
318
410
 
319
- If the result has no `sectionTitle`, the original URL is returned unchanged.
411
+ These are Svelte 5 components copied to `src/lib/components/search/` (configurable via `--dir`). They're starting points to customize, not dependencies.
412
+
413
+ ## Scroll-to-Text Navigation
414
+
415
+ When a user clicks a search result, SearchSocket scrolls them to the matching section on the destination page.
320
416
 
321
- ### `searchsocketScrollToText`
417
+ ### Setup
322
418
 
323
- A SvelteKit `afterNavigate` hook that reads the `_ssk` parameter and scrolls the matching heading into view. Add it to your root layout:
419
+ Add the scroll handler to your root layout:
324
420
 
325
421
  ```svelte
326
422
  <!-- src/routes/+layout.svelte -->
@@ -332,489 +428,627 @@ A SvelteKit `afterNavigate` hook that reads the `_ssk` parameter and scrolls the
332
428
  </script>
333
429
  ```
334
430
 
335
- The hook:
336
- - Matches headings (h1–h6) case-insensitively with whitespace normalization
337
- - Falls back to a broader text node search if no heading matches
338
- - Scrolls smoothly to the first match
339
- - Is a silent no-op when `_ssk` is absent or no match is found
431
+ ### How it works
340
432
 
341
- ## Vector Backend: Turso/libSQL
433
+ 1. `buildResultUrl()` encodes the section title and text snippet into the URL
434
+ 2. On SvelteKit client-side navigation, the `afterNavigate` hook reads `_ssk`/`_sskt` params
435
+ 3. A TreeWalker-based text mapper finds the exact position in the DOM
436
+ 4. The page scrolls smoothly to the match
437
+ 5. The matching text is highlighted using the [CSS Custom Highlight API](https://developer.mozilla.org/en-US/docs/Web/API/CSS_Custom_Highlight_API) (with a DOM fallback for older browsers)
438
+ 6. On full page loads, browsers that support Text Fragments (`#:~:text=`) handle scrolling natively
342
439
 
343
- SearchSocket uses **Turso** (libSQL) as its single vector backend, providing a unified experience across development and production.
440
+ The highlight fades after 2 seconds. Customize with CSS:
344
441
 
345
- ### Local Development
442
+ ```css
443
+ ::highlight(ssk-highlight) {
444
+ background-color: rgba(250, 204, 21, 0.4);
445
+ }
446
+ ```
346
447
 
347
- By default, SearchSocket uses a **local file database**:
348
- - Path: `.searchsocket/vectors.db` (configurable)
349
- - No account or API keys needed
350
- - Full vector search with `libsql_vector_idx` and `vector_top_k`
351
- - Perfect for local development and CI testing
448
+ ## Search & Ranking
352
449
 
353
- ### Production (Remote Turso)
450
+ ### Dual search
354
451
 
355
- For production, switch to **Turso's hosted service**:
452
+ By default, SearchSocket runs two parallel queries — one against page-level summaries and one against individual chunks — then blends the scores:
356
453
 
357
- 1. **Sign up for Turso** (free tier available):
358
- ```bash
359
- # Install Turso CLI
360
- brew install tursodatabase/tap/turso
454
+ ```ts
455
+ export default {
456
+ search: {
457
+ dualSearch: true, // default
458
+ pageSearchWeight: 0.3 // weight of page results vs chunks (0-1)
459
+ }
460
+ };
461
+ ```
361
462
 
362
- # Sign up
363
- turso auth signup
463
+ ### Page aggregation
364
464
 
365
- # Create a database
366
- turso db create searchsocket-prod
465
+ With `groupBy: "page"` (default), chunk results are grouped by page URL:
367
466
 
368
- # Get credentials
369
- turso db show searchsocket-prod --url
370
- turso db tokens create searchsocket-prod
371
- ```
467
+ 1. The top chunk score becomes the base page score
468
+ 2. Additional matching chunks add a decaying bonus: `chunk_score * decay^i`
469
+ 3. Per-URL page weights are applied multiplicatively
372
470
 
373
- 2. **Set environment variables**:
374
- ```bash
375
- TURSO_DATABASE_URL=libsql://searchsocket-prod-xxx.turso.io
376
- TURSO_AUTH_TOKEN=eyJhbGc...
377
- ```
471
+ ### Ranking configuration
378
472
 
379
- 3. **Index normally** — SearchSocket auto-detects the remote URL and uses it.
473
+ ```ts
474
+ export default {
475
+ ranking: {
476
+ enableIncomingLinkBoost: true, // boost pages with more internal links pointing to them
477
+ enableDepthBoost: true, // boost shallower pages (/ > /docs > /docs/api)
478
+ enableFreshnessBoost: false, // boost recently published content
479
+ enableAnchorTextBoost: false, // boost pages whose link text matches the query
380
480
 
381
- ### Direct Credential Passing
481
+ pageWeights: { // per-URL score multipliers (prefix matching)
482
+ "/": 0.95,
483
+ "/docs": 1.15,
484
+ "/download": 1.05
485
+ },
382
486
 
383
- Instead of environment variables, you can pass credentials directly in the config. This is useful for serverless deployments or multi-tenant setups:
487
+ aggregationCap: 5, // max chunks contributing to page score
488
+ aggregationDecay: 0.5, // decay for additional chunks
489
+ minScoreRatio: 0.70, // drop results below 70% of best score
490
+ scoreGapThreshold: 0.4, // trim results >40% below best
491
+ minChunkScoreRatio: 0.5, // threshold for sub-chunks
384
492
 
385
- ```ts
386
- export default {
387
- embeddings: {
388
- apiKey: "jina_..." // direct API key (takes precedence over apiKeyEnv)
389
- },
390
- vector: {
391
- turso: {
392
- url: "libsql://my-db.turso.io", // direct URL
393
- authToken: "eyJhbGc..." // direct auth token
493
+ weights: {
494
+ incomingLinks: 0.05,
495
+ depth: 0.03,
496
+ aggregation: 0.1,
497
+ titleMatch: 0.15,
498
+ freshness: 0.1,
499
+ anchorText: 0.10
394
500
  }
395
501
  }
396
502
  };
397
503
  ```
398
504
 
399
- Direct values take precedence over environment variable lookups (`apiKeyEnv`, `urlEnv`, `authTokenEnv`).
505
+ Use gentle `pageWeights` values (0.9–1.2) since they compound with other boosts.
400
506
 
401
- ### Dimension Mismatch Auto-Recovery
402
-
403
- When switching embedding models (e.g., from a 1536-dim model to Jina's 1024-dim), the vector dimension changes. SearchSocket automatically detects this and recreates the chunks table with the new dimension — no manual intervention needed. A full re-index (`--force`) is still required after switching models.
507
+ ## Build-Triggered Indexing
404
508
 
405
- ### Why Turso?
509
+ The recommended workflow is to index automatically on every deploy. Add the Vite plugin to your config:
406
510
 
407
- - **Single backend** — one unified Turso/libSQL store for vectors, metadata, and state
408
- - **Local-first development** — zero external dependencies for local dev
409
- - **Production-ready** same codebase scales to remote hosted DB
410
- - **Cost-effective** Turso free tier includes 9GB storage, 500M row reads/month
411
- - **Vector search native** — `F32_BLOB` vectors, cosine similarity index, `vector_top_k` ANN queries
511
+ ```ts
512
+ // vite.config.ts
513
+ import { sveltekit } from "@sveltejs/kit/vite";
514
+ import { searchsocketVitePlugin } from "searchsocket/sveltekit";
412
515
 
413
- ## Serverless Deployment (Vercel, Netlify, etc.)
516
+ export default {
517
+ plugins: [
518
+ sveltekit(),
519
+ searchsocketVitePlugin({
520
+ changedOnly: true, // incremental indexing (default)
521
+ verbose: true
522
+ })
523
+ ]
524
+ };
525
+ ```
414
526
 
415
- SearchSocket works on serverless platforms with a few adjustments:
527
+ ### Vercel / Cloudflare / Netlify
416
528
 
417
- ### Requirements
529
+ Set these environment variables in your hosting platform:
418
530
 
419
- 1. **Remote Turso database** — local SQLite is not available in serverless (no persistent filesystem). Set `TURSO_DATABASE_URL` and `TURSO_AUTH_TOKEN` as platform environment variables.
531
+ | Variable | Value |
532
+ |----------|-------|
533
+ | `UPSTASH_VECTOR_REST_URL` | Your Upstash Vector REST URL |
534
+ | `UPSTASH_VECTOR_REST_TOKEN` | Your Upstash Vector REST token |
535
+ | `SEARCHSOCKET_AUTO_INDEX` | `1` |
420
536
 
421
- 2. **Inline config via `rawConfig`** the default config loader uses `jiti` to import `searchsocket.config.ts` from disk, which isn't bundled in serverless. Use `rawConfig` to pass config inline:
537
+ Every deploy will build your site, index the content into Upstash, and serve the search API and MCP endpoint fully automated.
422
538
 
423
- ```ts
424
- // hooks.server.ts (Vercel / Netlify)
425
- import { searchsocketHandle } from "searchsocket/sveltekit";
539
+ ### Environment variable control
426
540
 
427
- export const handle = searchsocketHandle({
428
- rawConfig: {
429
- project: { id: "my-docs-site" },
430
- source: { mode: "static-output" },
431
- embeddings: { apiKeyEnv: "JINA_API_KEY" },
432
- }
433
- });
434
- ```
541
+ ```bash
542
+ # Enable indexing on build
543
+ SEARCHSOCKET_AUTO_INDEX=1 pnpm build
435
544
 
436
- 3. **Environment variables** — set these on your platform dashboard:
437
- - `JINA_API_KEY`
438
- - `TURSO_DATABASE_URL`
439
- - `TURSO_AUTH_TOKEN`
545
+ # Disable temporarily
546
+ SEARCHSOCKET_DISABLE_AUTO_INDEX=1 pnpm build
440
547
 
441
- ### Rate Limiting
548
+ # Force full rebuild (ignore incremental cache)
549
+ SEARCHSOCKET_FORCE_REINDEX=1 pnpm build
550
+ ```
442
551
 
443
- The built-in `InMemoryRateLimiter` auto-disables on serverless platforms (it resets on every cold start). Use your platform's WAF or edge rate-limiting instead.
552
+ ## Making Images Searchable
444
553
 
445
- ### What Only Applies to Indexing
554
+ SearchSocket converts images to text during extraction using this priority chain:
446
555
 
447
- The following features are only used during `searchsocket index` (CLI), not the search handler:
448
- - `ensureStateDirs` creates `.searchsocket/` state directories
449
- - Local SQLite fallback — only needed when `TURSO_DATABASE_URL` is not set
556
+ 1. `data-search-description` on the `<img>` your explicit description
557
+ 2. `data-search-description` on the parent `<figure>`
558
+ 3. `alt` text + `<figcaption>` combined
559
+ 4. `alt` text alone (filters generic words like "image", "icon")
560
+ 5. `<figcaption>` alone
561
+ 6. Removed — images with no useful text are dropped
450
562
 
451
- ### Adapter Guidance
563
+ ```html
564
+ <img
565
+ src="/screenshots/settings.png"
566
+ alt="Settings page"
567
+ data-search-description="The settings page showing API key configuration, theme selection, and notification preferences"
568
+ />
569
+ ```
452
570
 
453
- | Platform | Adapter | Notes |
454
- |----------|---------|-------|
455
- | Vercel | `adapter-auto` (default) | Serverless — use `rawConfig` + remote Turso |
456
- | Netlify | `adapter-netlify` | Serverless — same as Vercel |
457
- | VPS / Docker | `adapter-node` | Long-lived process — no limitations, local SQLite works |
571
+ Works with SvelteKit's `enhanced:img`:
458
572
 
459
- ## Embeddings: Jina AI
573
+ ```svelte
574
+ <enhanced:img
575
+ src="./screenshots/dashboard.png"
576
+ alt="Dashboard"
577
+ data-search-description="Main dashboard showing active projects and indexing status"
578
+ />
579
+ ```
460
580
 
461
- SearchSocket uses **Jina AI's embedding models** to convert text into semantic vectors. A single `JINA_API_KEY` powers both embeddings and optional reranking.
581
+ ## MCP Server
462
582
 
463
- ### Default Model
583
+ SearchSocket includes an MCP server that gives Claude Code, Claude Desktop, and other MCP clients direct access to your site's search index. The MCP endpoint is built into `searchsocketHandle()` — once your site is deployed, any MCP client can connect to it over HTTP.
464
584
 
465
- - **Model**: `jina-embeddings-v5-text-small`
466
- - **Dimensions**: 1024 (default)
467
- - **Cost**: ~$0.00005 per 1K tokens
468
- - **Task adapters**: Uses `retrieval.passage` for indexing, `retrieval.query` for search queries (LoRA task-specific adapters for better retrieval quality)
585
+ ### Available tools
469
586
 
470
- ### How It Works
587
+ | Tool | Description |
588
+ |------|-------------|
589
+ | `search` | Semantic search with filtering, grouping, and reranking |
590
+ | `get_page` | Retrieve full page markdown with frontmatter |
591
+ | `list_pages` | Cursor-paginated page listing |
592
+ | `get_site_structure` | Hierarchical page tree |
593
+ | `find_source_file` | Locate the SvelteKit source file for content |
594
+ | `get_related_pages` | Find related pages by links, semantics, and structure |
471
595
 
472
- 1. **Chunking**: Text is split into semantic chunks (default 2200 chars, 200 overlap)
473
- 2. **Title Prepend**: Page title is prepended to each chunk for better context (`chunking.prependTitle`, default: true)
474
- 3. **Summary Chunk**: A synthetic identity chunk is generated per page with title, URL, and first paragraph (`chunking.pageSummaryChunk`, default: true)
475
- 4. **Embedding**: Each chunk is sent to Jina's embedding API with the `retrieval.passage` task adapter
476
- 5. **Batching**: Requests batched (64 texts per request) for efficiency
477
- 6. **Storage**: Vectors stored in Turso with metadata (URL, title, tags, depth, etc.)
596
+ ### Connecting to your deployed site
478
597
 
479
- ### Cost Estimation
598
+ The recommended setup is to connect Claude Code to your deployed site's MCP endpoint. This way the index stays up to date automatically as you deploy, and there's no local process to manage.
480
599
 
481
- Use `--dry-run` to preview costs:
482
- ```bash
483
- pnpm searchsocket index --dry-run
484
- ```
600
+ Add `.mcp.json` to your project root:
485
601
 
486
- Output:
487
- ```
488
- pages processed: 42
489
- chunks total: 156
490
- chunks changed: 156
491
- embeddings created: 156
492
- estimated tokens: 32,400
493
- estimated cost (USD): $0.000648
602
+ ```json
603
+ {
604
+ "mcpServers": {
605
+ "searchsocket": {
606
+ "type": "http",
607
+ "url": "https://your-site.com/api/mcp"
608
+ }
609
+ }
610
+ }
494
611
  ```
495
612
 
496
- ### Reranking
613
+ That's it. Restart Claude Code and the six search tools are available. You can search your docs, retrieve page content, and find source files directly from the AI assistant.
497
614
 
498
- Since embeddings and reranking share the same Jina API key, enabling reranking is one boolean:
615
+ To protect the endpoint, add API key authentication:
499
616
 
500
617
  ```ts
501
- export default {
502
- embeddings: { apiKeyEnv: "JINA_API_KEY" },
503
- rerank: { enabled: true }
504
- };
618
+ // src/hooks.server.ts
619
+ export const handle = searchsocketHandle({
620
+ rawConfig: {
621
+ mcp: {
622
+ handle: {
623
+ apiKey: process.env.SEARCHSOCKET_MCP_API_KEY
624
+ }
625
+ }
626
+ }
627
+ });
505
628
  ```
506
629
 
507
- **Note**: Changing the model after indexing requires re-indexing with `--force`.
630
+ Then pass the key in `.mcp.json`:
508
631
 
509
- ## Search & Ranking
632
+ ```json
633
+ {
634
+ "mcpServers": {
635
+ "searchsocket": {
636
+ "type": "http",
637
+ "url": "https://your-site.com/api/mcp",
638
+ "headers": {
639
+ "Authorization": "Bearer ${SEARCHSOCKET_MCP_API_KEY}"
640
+ }
641
+ }
642
+ }
643
+ }
644
+ ```
510
645
 
511
- ### Page Aggregation
646
+ The `${SEARCHSOCKET_MCP_API_KEY}` syntax references an environment variable so you don't hardcode secrets in `.mcp.json`.
512
647
 
513
- By default (`groupBy: "page"`), SearchSocket groups chunk results by page URL and computes a page-level score:
648
+ ### Auto-approving in Claude Code
514
649
 
515
- 1. The top chunk score becomes the base page score
516
- 2. Additional matching chunks contribute a decaying bonus: `chunk_score * decay^i`
517
- 3. Optional per-URL page weights are applied multiplicatively
650
+ Skip the approval prompt each time a tool is called:
518
651
 
519
- Configure aggregation behavior:
652
+ ```json
653
+ {
654
+ "allowedMcpServers": [
655
+ { "serverName": "searchsocket" }
656
+ ]
657
+ }
658
+ ```
520
659
 
521
- ```ts
522
- export default {
523
- ranking: {
524
- minScore: 0, // minimum absolute score to include in results (default: 0, disabled)
525
- aggregationCap: 5, // max chunks contributing to page score (default: 5)
526
- aggregationDecay: 0.5, // decay factor for additional chunks (default: 0.5)
527
- minChunkScoreRatio: 0.5, // threshold for sub-chunks in results (default: 0.5)
528
- pageWeights: { // per-URL score multipliers
529
- "/": 1.1,
530
- "/docs": 1.15,
531
- "/download": 1.2
532
- },
533
- weights: {
534
- aggregation: 0.1, // weight of aggregation bonus (default: 0.1)
535
- incomingLinks: 0.05, // incoming link boost weight (default: 0.05)
536
- depth: 0.03, // URL depth boost weight (default: 0.03)
537
- rerank: 1.0 // reranker score weight (default: 1.0)
660
+ Add this to `.claude/settings.json` in your project.
661
+
662
+ ### Local development
663
+
664
+ During local development, you can point to your dev server instead:
665
+
666
+ ```json
667
+ {
668
+ "mcpServers": {
669
+ "searchsocket": {
670
+ "type": "http",
671
+ "url": "http://localhost:5173/api/mcp"
538
672
  }
539
673
  }
540
- };
674
+ }
541
675
  ```
542
676
 
543
- `pageWeights` supports exact URL matches and prefix matching. A weight of `1.15` on `"/docs"` boosts all pages under `/docs/` by 15%. Use gentle values (1.05-1.2x) since they compound with aggregation.
677
+ ### Claude Desktop
678
+
679
+ Add to `~/Library/Application Support/Claude/claude_desktop_config.json`:
544
680
 
545
- `minScore` filters out low-relevance results before they reach the client. Set to a value like `0.3` to remove noise. In page mode, pages below the threshold are dropped; in chunk mode, individual chunks are filtered. Default is `0` (disabled).
681
+ ```json
682
+ {
683
+ "mcpServers": {
684
+ "searchsocket": {
685
+ "command": "npx",
686
+ "args": ["searchsocket", "mcp"],
687
+ "cwd": "/path/to/your/project"
688
+ }
689
+ }
690
+ }
691
+ ```
546
692
 
547
- ### Chunk Mode
693
+ ### Standalone HTTP server
548
694
 
549
- Use `groupBy: "chunk"` for flat per-chunk results without page aggregation:
695
+ Run the MCP server as a standalone process (outside SvelteKit):
550
696
 
551
697
  ```bash
552
- curl -X POST http://localhost:5173/api/search \
553
- -H "content-type: application/json" \
554
- -d '{"q":"vector search","topK":10,"groupBy":"chunk"}'
698
+ pnpm searchsocket mcp --transport http --port 3338
555
699
  ```
556
700
 
557
- ## Build-Triggered Indexing
701
+ ## llms.txt Generation
558
702
 
559
- Automatically index after each SvelteKit build.
703
+ Generate [llms.txt](https://llmstxt.org/) files during indexing — a standardized way to make your site content available to LLMs.
560
704
 
561
- **`vite.config.ts` or `svelte.config.js`:**
562
705
  ```ts
563
- import { searchsocketVitePlugin } from "searchsocket/sveltekit";
564
-
565
706
  export default {
566
- plugins: [
567
- svelteKitPlugin(),
568
- searchsocketVitePlugin({
569
- enabled: true, // or check process.env.SEARCHSOCKET_AUTO_INDEX
570
- changedOnly: true, // incremental indexing (faster)
571
- verbose: false
572
- })
573
- ]
707
+ project: {
708
+ baseUrl: "https://example.com"
709
+ },
710
+ llmsTxt: {
711
+ enable: true,
712
+ title: "My Project",
713
+ description: "Documentation for My Project",
714
+ outputPath: "static/llms.txt", // default
715
+ generateFull: true, // also generate llms-full.txt
716
+ serveMarkdownVariants: false // serve /page.md variants via the hook
717
+ }
574
718
  };
575
719
  ```
576
720
 
577
- **Environment control:**
578
- ```bash
579
- # Enable via env var
580
- SEARCHSOCKET_AUTO_INDEX=1 pnpm build
721
+ After indexing, `llms.txt` (page index with links) and `llms-full.txt` (full content) are written to your static directory and served by `searchsocketHandle()`.
581
722
 
582
- # Disable via env var
583
- SEARCHSOCKET_DISABLE_AUTO_INDEX=1 pnpm build
584
- ```
585
-
586
- ## Commands
723
+ ## CLI Commands
587
724
 
588
725
  ### `searchsocket init`
589
726
 
590
- Initialize config and state directory.
727
+ Initialize config and state directory. Creates `searchsocket.config.ts`, `.searchsocket/`, `.mcp.json`, and wires up your hooks and Vite config.
591
728
 
592
729
  ```bash
593
730
  pnpm searchsocket init
731
+ pnpm searchsocket init --non-interactive
594
732
  ```
595
733
 
596
734
  ### `searchsocket index`
597
735
 
598
- Index content into vectors.
736
+ Index content into Upstash Vector.
599
737
 
600
738
  ```bash
601
- # Incremental (only changed chunks)
602
- pnpm searchsocket index --changed-only
739
+ pnpm searchsocket index # incremental (default: --changed-only)
740
+ pnpm searchsocket index --force # full re-index
741
+ pnpm searchsocket index --source build # override source mode
742
+ pnpm searchsocket index --scope staging # override scope
743
+ pnpm searchsocket index --dry-run # preview without writing
744
+ pnpm searchsocket index --max-pages 10 # limit for testing
745
+ pnpm searchsocket index --verbose # detailed output
746
+ pnpm searchsocket index --json # machine-readable output
747
+ ```
603
748
 
604
- # Full re-index
605
- pnpm searchsocket index --force
749
+ ### `searchsocket search`
606
750
 
607
- # Preview cost without indexing
608
- pnpm searchsocket index --dry-run
751
+ CLI search for testing.
609
752
 
610
- # Override source mode
611
- pnpm searchsocket index --source build
753
+ ```bash
754
+ pnpm searchsocket search --q "getting started" --top-k 5
755
+ pnpm searchsocket search --q "api" --path-prefix /docs
756
+ ```
612
757
 
613
- # Limit for testing
614
- pnpm searchsocket index --max-pages 10 --max-chunks 50
758
+ ### `searchsocket dev`
615
759
 
616
- # Override scope
617
- pnpm searchsocket index --scope staging
760
+ Watch for file changes and auto-reindex, with optional playground UI.
618
761
 
619
- # Verbose output
620
- pnpm searchsocket index --verbose
762
+ ```bash
763
+ pnpm searchsocket dev # watch + playground at :3337
764
+ pnpm searchsocket dev --mcp --mcp-port 3338 # also start MCP HTTP server
765
+ pnpm searchsocket dev --no-playground # watch only
621
766
  ```
622
767
 
623
768
  ### `searchsocket status`
624
769
 
625
- Show indexing status, scope, and vector health.
770
+ Show indexing status and backend health.
626
771
 
627
772
  ```bash
628
773
  pnpm searchsocket status
774
+ ```
775
+
776
+ ### `searchsocket doctor`
629
777
 
630
- # Output:
631
- # project: my-site
632
- # resolved scope: main
633
- # embedding model: jina-embeddings-v5-text-small
634
- # vector backend: turso/libsql (local (.searchsocket/vectors.db))
635
- # vector health: ok
636
- # last indexed (main): 2025-02-23T10:30:00Z
637
- # tracked chunks: 156
638
- # last estimated tokens: 32,400
639
- # last estimated cost: $0.000648
778
+ Validate config, env vars, provider connectivity, and write access.
779
+
780
+ ```bash
781
+ pnpm searchsocket doctor
640
782
  ```
641
783
 
642
- ### `searchsocket dev`
784
+ ### `searchsocket test`
643
785
 
644
- Watch for file changes and auto-reindex.
786
+ Run search quality assertions against the live index.
645
787
 
646
788
  ```bash
647
- pnpm searchsocket dev
789
+ pnpm searchsocket test # uses searchsocket.test.json
790
+ pnpm searchsocket test --file custom-tests.json # custom test file
791
+ ```
792
+
793
+ Test file format:
648
794
 
649
- # With MCP server
650
- pnpm searchsocket dev --mcp --mcp-port 3338
795
+ ```json
796
+ [
797
+ {
798
+ "query": "installation guide",
799
+ "expect": {
800
+ "topResult": "/docs/getting-started",
801
+ "inTop5": ["/docs/getting-started", "/docs/quickstart"]
802
+ }
803
+ }
804
+ ]
651
805
  ```
652
806
 
653
- Watches:
654
- - `src/routes/**` (route files)
655
- - `build/` (if static-output mode)
656
- - Build output dir (if build mode)
657
- - Content files (if content-files mode)
658
- - `searchsocket.config.ts` (if crawl or build mode)
807
+ Reports pass/fail per assertion and Mean Reciprocal Rank (MRR) across all queries.
659
808
 
660
809
  ### `searchsocket clean`
661
810
 
662
- Delete local state and optionally remote vectors.
811
+ Delete local state and optionally remote indexes.
663
812
 
664
813
  ```bash
665
- # Local state only
666
- pnpm searchsocket clean
667
-
668
- # Local + remote vectors
669
- pnpm searchsocket clean --remote --scope staging
814
+ pnpm searchsocket clean # local state only
815
+ pnpm searchsocket clean --remote # also delete remote scope
816
+ pnpm searchsocket clean --scope staging # specific scope
670
817
  ```
671
818
 
672
819
  ### `searchsocket prune`
673
820
 
674
- Delete stale scopes (e.g., deleted git branches).
821
+ List and delete stale scopes. Compares against git branches to find orphaned scopes.
675
822
 
676
823
  ```bash
677
- # Dry run (shows what would be deleted)
678
- pnpm searchsocket prune --older-than 30d
824
+ pnpm searchsocket prune # dry-run (default)
825
+ pnpm searchsocket prune --apply # actually delete
826
+ pnpm searchsocket prune --older-than 30d # only scopes older than 30 days
827
+ ```
679
828
 
680
- # Apply deletions
681
- pnpm searchsocket prune --older-than 30d --apply
829
+ ### `searchsocket mcp`
682
830
 
683
- # Use custom scope list
684
- pnpm searchsocket prune --scopes-file active-branches.txt --apply
831
+ Run the MCP server standalone.
832
+
833
+ ```bash
834
+ pnpm searchsocket mcp # stdio (default)
835
+ pnpm searchsocket mcp --transport http --port 3338 # HTTP
836
+ pnpm searchsocket mcp --access public --api-key SECRET # public with auth
685
837
  ```
686
838
 
687
- ### `searchsocket doctor`
839
+ ### `searchsocket add`
688
840
 
689
- Validate config, env vars, and connectivity.
841
+ Copy Svelte 5 search UI template components into your project.
690
842
 
691
843
  ```bash
692
- pnpm searchsocket doctor
693
-
694
- # Output:
695
- # PASS config parse
696
- # PASS env JINA_API_KEY
697
- # PASS turso/libsql (local file: .searchsocket/vectors.db)
698
- # PASS source: build manifest
699
- # PASS source: vite binary
700
- # PASS embedding provider connectivity
701
- # PASS vector backend connectivity
702
- # PASS vector backend write permission
703
- # PASS state directory writable
844
+ pnpm searchsocket add search-dialog
845
+ pnpm searchsocket add search-input
846
+ pnpm searchsocket add search-results
847
+ pnpm searchsocket add search-dialog --dir src/lib/components/ui # custom dir
704
848
  ```
705
849
 
706
- ### `searchsocket mcp`
850
+ ## Real-World Example
707
851
 
708
- Run MCP server for Claude Desktop / other MCP clients.
852
+ Here's how [Canopy](https://canopy.dev) integrates SearchSocket into a production SvelteKit site.
709
853
 
710
- ```bash
711
- # stdio transport (default)
712
- pnpm searchsocket mcp
854
+ ### Configuration
713
855
 
714
- # HTTP transport
715
- pnpm searchsocket mcp --transport http --port 3338
856
+ ```ts
857
+ // searchsocket.config.ts
858
+ export default {
859
+ project: {
860
+ id: "canopy-website",
861
+ baseUrl: "https://canopy.dev"
862
+ },
863
+ source: {
864
+ mode: "build"
865
+ },
866
+ extract: {
867
+ dropSelectors: [".nav-blur", ".mobile-overlay", ".docs-sidebar"]
868
+ },
869
+ ranking: {
870
+ minScoreRatio: 0.70,
871
+ pageWeights: {
872
+ "/": 0.95,
873
+ "/download": 1.05,
874
+ "/docs/**": 1.05
875
+ },
876
+ aggregationCap: 3,
877
+ aggregationDecay: 0.3
878
+ }
879
+ };
716
880
  ```
717
881
 
718
- ### `searchsocket search`
882
+ ### Server hook
719
883
 
720
- CLI search for testing.
884
+ ```ts
885
+ // src/hooks.server.ts
886
+ import { searchsocketHandle } from "searchsocket/sveltekit";
887
+ import { env } from "$env/dynamic/private";
721
888
 
722
- ```bash
723
- pnpm searchsocket search --q "turso vector search" --top-k 5 --rerank
889
+ export const handle = searchsocketHandle({
890
+ rawConfig: {
891
+ project: { id: "canopy-website", baseUrl: "https://canopy.dev" },
892
+ source: { mode: "build" },
893
+ upstash: {
894
+ url: env.UPSTASH_VECTOR_REST_URL,
895
+ token: env.UPSTASH_VECTOR_REST_TOKEN
896
+ },
897
+ extract: {
898
+ dropSelectors: [".nav-blur", ".mobile-overlay", ".docs-sidebar"]
899
+ },
900
+ ranking: {
901
+ minScoreRatio: 0.70,
902
+ pageWeights: { "/": 0.95, "/download": 1.05, "/docs/**": 1.05 },
903
+ aggregationCap: 3,
904
+ aggregationDecay: 0.3
905
+ }
906
+ }
907
+ });
724
908
  ```
725
909
 
726
- ## MCP (Model Context Protocol)
910
+ ### Search modal with scoped search
727
911
 
728
- SearchSocket provides an **MCP server** for integration with Claude Code, Claude Desktop, and other MCP-compatible AI tools. This gives AI assistants direct access to your indexed site content for semantic search and page retrieval.
912
+ ```svelte
913
+ <!-- SearchModal.svelte -->
914
+ <script>
915
+ import { createSearchClient, buildResultUrl } from "searchsocket/client";
916
+
917
+ let { open = $bindable(false), pathPrefix = "", placeholder = "Search..." } = $props();
918
+
919
+ const client = createSearchClient();
920
+ let query = $state("");
921
+ let results = $state([]);
922
+
923
+ async function doSearch() {
924
+ if (!query.trim()) { results = []; return; }
925
+ const res = await client.search({
926
+ q: query,
927
+ topK: 8,
928
+ groupBy: "page",
929
+ pathPrefix: pathPrefix || undefined
930
+ });
931
+ results = res.results;
932
+ }
933
+ </script>
934
+
935
+ {#if open}
936
+ <dialog open>
937
+ <input bind:value={query} oninput={doSearch} {placeholder} />
938
+ {#each results as result}
939
+ <a href={buildResultUrl(result)} onclick={() => open = false}>
940
+ <strong>{result.title}</strong>
941
+ {#if result.sectionTitle}<span>— {result.sectionTitle}</span>{/if}
942
+ <p>{result.snippet}</p>
943
+ </a>
944
+ {/each}
945
+ </dialog>
946
+ {/if}
947
+ ```
729
948
 
730
- ### Tools
949
+ ### Scroll-to-text in layout
950
+
951
+ ```svelte
952
+ <!-- src/routes/+layout.svelte -->
953
+ <script>
954
+ import { afterNavigate } from "$app/navigation";
955
+ import { searchsocketScrollToText } from "searchsocket/sveltekit";
956
+
957
+ afterNavigate(searchsocketScrollToText);
958
+ </script>
959
+ ```
731
960
 
732
- **`search(query, opts?)`**
733
- - Semantic search across indexed content
734
- - Returns ranked results with URL, title, snippet, score, and routeFile
735
- - Options: `scope`, `topK` (1-100), `pathPrefix`, `tags`, `groupBy` (`"page"` | `"chunk"`)
961
+ ### Deploy and index
736
962
 
737
- **`get_page(pathOrUrl, opts?)`**
738
- - Retrieve full indexed page content as markdown with frontmatter
739
- - Options: `scope`
963
+ Indexing runs automatically on every Vercel deploy. Set these env vars in the Vercel dashboard:
740
964
 
741
- ### Setup (Claude Code)
965
+ - `UPSTASH_VECTOR_REST_URL`
966
+ - `UPSTASH_VECTOR_REST_TOKEN`
967
+ - `SEARCHSOCKET_AUTO_INDEX=1`
742
968
 
743
- Add a `.mcp.json` file to your project root (safe to commit — no secrets needed since the CLI auto-loads `.env`):
969
+ The Vite plugin handles the rest. Alternatively, use a postbuild script:
744
970
 
745
971
  ```json
746
972
  {
747
- "mcpServers": {
748
- "searchsocket": {
749
- "type": "stdio",
750
- "command": "npx",
751
- "args": ["searchsocket", "mcp"],
752
- "env": {}
753
- }
973
+ "scripts": {
974
+ "build": "vite build",
975
+ "postbuild": "searchsocket index"
754
976
  }
755
977
  }
756
978
  ```
757
979
 
758
- Restart Claude Code. The `search` and `get_page` tools will be available automatically. Verify with:
759
-
760
- ```bash
761
- claude mcp list
762
- ```
763
-
764
- ### Setup (Claude Desktop)
765
-
766
- Add to `~/Library/Application Support/Claude/claude_desktop_config.json`:
980
+ ### Connect Claude Code to the deployed site
767
981
 
768
982
  ```json
769
983
  {
770
984
  "mcpServers": {
771
985
  "searchsocket": {
772
- "command": "npx",
773
- "args": ["searchsocket", "mcp"],
774
- "cwd": "/path/to/your/project"
986
+ "type": "http",
987
+ "url": "https://canopy.dev/api/mcp"
775
988
  }
776
989
  }
777
990
  }
778
991
  ```
779
992
 
780
- Restart Claude Desktop. The tools appear in the MCP menu.
993
+ Now Claude Code can search the live docs, retrieve page content, and find source files — all backed by the production index that stays current with every deploy.
781
994
 
782
- ### HTTP Transport
995
+ ### Excluding pages from search
783
996
 
784
- For non-stdio clients, run the MCP server over HTTP:
785
-
786
- ```bash
787
- npx searchsocket mcp --transport http --port 3338
997
+ ```svelte
998
+ <!-- src/routes/blog/+page.svelte (archive page) -->
999
+ <svelte:head>
1000
+ <meta name="searchsocket-weight" content="0" />
1001
+ </svelte:head>
788
1002
  ```
789
1003
 
790
- This starts a stateless server at `http://127.0.0.1:3338/mcp`. Each POST request creates a fresh server instance with no session persistence.
1004
+ Or with the component:
791
1005
 
792
- ## Environment Variables
1006
+ ```svelte
1007
+ <script>
1008
+ import { SearchSocket } from "searchsocket/svelte";
1009
+ </script>
1010
+
1011
+ <SearchSocket weight={0} />
1012
+ ```
793
1013
 
794
- The CLI automatically loads `.env` from the working directory on startup. Existing `process.env` values take precedence over `.env` file values. This only applies to CLI commands (`searchsocket index`, `searchsocket mcp`, etc.) — library imports like `searchsocketHandle()` rely on your framework's own `.env` handling (Vite/SvelteKit).
1014
+ ### Vite SSR config
795
1015
 
796
- ### Required
1016
+ ```ts
1017
+ // vite.config.ts
1018
+ import { sveltekit } from "@sveltejs/kit/vite";
1019
+ import { defineConfig } from "vite";
1020
+
1021
+ export default defineConfig({
1022
+ plugins: [sveltekit()],
1023
+ ssr: {
1024
+ external: ["searchsocket", "searchsocket/sveltekit", "searchsocket/client"]
1025
+ }
1026
+ });
1027
+ ```
797
1028
 
798
- **Jina AI:**
799
- - `JINA_API_KEY` — Jina AI API key for embeddings and reranking
1029
+ ## Environment Variables
800
1030
 
801
- ### Optional (Turso)
1031
+ ### Required
802
1032
 
803
- **Remote Turso (production):**
804
- - `TURSO_DATABASE_URL` — Turso database URL (e.g., `libsql://my-db.turso.io`)
805
- - `TURSO_AUTH_TOKEN` Turso auth token
1033
+ | Variable | Description |
1034
+ |----------|-------------|
1035
+ | `UPSTASH_VECTOR_REST_URL` | Upstash Vector REST API endpoint |
1036
+ | `UPSTASH_VECTOR_REST_TOKEN` | Upstash Vector REST API token |
806
1037
 
807
- If not set, uses local file DB at `.searchsocket/vectors.db`.
1038
+ ### Optional
808
1039
 
809
- ### Optional (Scope/Build)
1040
+ | Variable | Description |
1041
+ |----------|-------------|
1042
+ | `SEARCHSOCKET_SCOPE` | Override scope (when `scope.mode: "env"`) |
1043
+ | `SEARCHSOCKET_AUTO_INDEX` | Enable build-triggered indexing (`1`, `true`, or `yes`) |
1044
+ | `SEARCHSOCKET_DISABLE_AUTO_INDEX` | Disable build-triggered indexing |
1045
+ | `SEARCHSOCKET_FORCE_REINDEX` | Force full re-index in CI/CD |
810
1046
 
811
- - `SEARCHSOCKET_SCOPE` Override scope (when `scope.mode: "env"`)
812
- - `SEARCHSOCKET_AUTO_INDEX` — Enable build-triggered indexing
813
- - `SEARCHSOCKET_DISABLE_AUTO_INDEX` — Disable build-triggered indexing
1047
+ The CLI automatically loads `.env` from the working directory on startup.
814
1048
 
815
- ## Configuration
1049
+ ## Configuration Reference
816
1050
 
817
- ### Full Example
1051
+ See [docs/config.md](docs/config.md) for the full configuration reference. Here's the full example:
818
1052
 
819
1053
  ```ts
820
1054
  export default {
@@ -824,41 +1058,24 @@ export default {
824
1058
  },
825
1059
 
826
1060
  scope: {
827
- mode: "git", // "fixed" | "git" | "env"
1061
+ mode: "git", // "fixed" | "git" | "env"
828
1062
  fixed: "main",
829
1063
  sanitize: true
830
1064
  },
831
1065
 
1066
+ exclude: ["/admin/*", "/api/*"],
1067
+ respectRobotsTxt: true,
1068
+
832
1069
  source: {
833
- mode: "build", // "static-output" | "crawl" | "content-files" | "build"
1070
+ mode: "build",
834
1071
  staticOutputDir: "build",
835
- strictRouteMapping: false,
836
-
837
- // Build mode (recommended for CI/CD)
838
1072
  build: {
839
- outputDir: ".svelte-kit/output",
840
- previewTimeout: 30000,
841
1073
  exclude: ["/api/*"],
842
1074
  paramValues: {
843
1075
  "/blog/[slug]": ["hello-world", "getting-started"]
844
1076
  },
845
- discover: false,
846
- seedUrls: ["/"],
847
- maxPages: 200,
848
- maxDepth: 5
849
- },
850
-
851
- // Crawl mode (alternative)
852
- crawl: {
853
- baseUrl: "http://localhost:4173",
854
- routes: ["/", "/docs", "/blog"],
855
- sitemapUrl: "https://example.com/sitemap.xml"
856
- },
857
-
858
- // Content files mode (alternative)
859
- contentFiles: {
860
- globs: ["src/routes/**/*.md"],
861
- baseDir: "."
1077
+ discover: true,
1078
+ maxPages: 200
862
1079
  }
863
1080
  },
864
1081
 
@@ -868,77 +1085,77 @@ export default {
868
1085
  dropSelectors: [".sidebar", ".toc"],
869
1086
  ignoreAttr: "data-search-ignore",
870
1087
  noindexAttr: "data-search-noindex",
871
- respectRobotsNoindex: true
1088
+ imageDescAttr: "data-search-description"
872
1089
  },
873
1090
 
874
1091
  chunking: {
875
- maxChars: 2200,
1092
+ maxChars: 1500,
876
1093
  overlapChars: 200,
877
1094
  minChars: 250,
878
- headingPathDepth: 3,
879
- dontSplitInside: ["code", "table", "blockquote"],
880
- prependTitle: true, // prepend page title to chunk text before embedding
881
- pageSummaryChunk: true // generate synthetic identity chunk per page
1095
+ prependTitle: true,
1096
+ pageSummaryChunk: true
882
1097
  },
883
1098
 
884
- embeddings: {
885
- provider: "jina",
886
- model: "jina-embeddings-v5-text-small",
887
- apiKey: "jina_...", // direct API key (or use apiKeyEnv)
888
- apiKeyEnv: "JINA_API_KEY",
889
- batchSize: 64,
890
- concurrency: 4
1099
+ upstash: {
1100
+ urlEnv: "UPSTASH_VECTOR_REST_URL",
1101
+ tokenEnv: "UPSTASH_VECTOR_REST_TOKEN"
891
1102
  },
892
1103
 
893
- vector: {
894
- dimension: 1024, // optional, inferred from first embedding
895
- turso: {
896
- url: "libsql://my-db.turso.io", // direct URL (or use urlEnv)
897
- authToken: "eyJhbGc...", // direct token (or use authTokenEnv)
898
- urlEnv: "TURSO_DATABASE_URL",
899
- authTokenEnv: "TURSO_AUTH_TOKEN",
900
- localPath: ".searchsocket/vectors.db"
901
- }
902
- },
903
-
904
- rerank: {
905
- enabled: true,
906
- topN: 20,
907
- model: "jina-reranker-v3"
1104
+ search: {
1105
+ dualSearch: true,
1106
+ pageSearchWeight: 0.3
908
1107
  },
909
1108
 
910
1109
  ranking: {
911
1110
  enableIncomingLinkBoost: true,
912
1111
  enableDepthBoost: true,
913
- pageWeights: {
914
- "/": 1.1,
915
- "/docs": 1.15
916
- },
917
- minScore: 0,
1112
+ pageWeights: { "/docs": 1.15 },
1113
+ minScoreRatio: 0.70,
918
1114
  aggregationCap: 5,
919
- aggregationDecay: 0.5,
920
- minChunkScoreRatio: 0.5,
921
- weights: {
922
- incomingLinks: 0.05,
923
- depth: 0.03,
924
- rerank: 1.0,
925
- aggregation: 0.1
926
- }
1115
+ aggregationDecay: 0.5
927
1116
  },
928
1117
 
929
1118
  api: {
930
1119
  path: "/api/search",
931
- cors: {
932
- allowOrigins: ["https://example.com"]
933
- },
934
- rateLimit: {
935
- windowMs: 60_000,
936
- max: 60
937
- }
1120
+ cors: { allowOrigins: ["https://example.com"] }
1121
+ },
1122
+
1123
+ mcp: {
1124
+ enable: true,
1125
+ handle: { path: "/api/mcp" }
1126
+ },
1127
+
1128
+ llmsTxt: {
1129
+ enable: true,
1130
+ title: "My Project",
1131
+ description: "Documentation for My Project"
1132
+ },
1133
+
1134
+ state: {
1135
+ dir: ".searchsocket"
938
1136
  }
939
1137
  };
940
1138
  ```
941
1139
 
1140
+ ## CI/CD
1141
+
1142
+ See [docs/ci.md](docs/ci.md) for ready-to-use GitHub Actions workflows covering:
1143
+
1144
+ - Main branch indexing on push
1145
+ - PR dry-run validation
1146
+ - Preview branch scope isolation
1147
+ - Scheduled scope pruning
1148
+ - Vercel build-triggered indexing
1149
+
1150
+ ## Further Reading
1151
+
1152
+ - [Building a Search UI](docs/search-ui.md) — Cmd+K modals, scoped search, styling, and API reference
1153
+ - [Tuning Search Relevance](docs/tuning.md) — visual playground, ranking parameters, and search quality testing
1154
+ - [Configuration Reference](docs/config.md) — all config options, indexing hooks, and custom records
1155
+ - [CI/CD Workflows](docs/ci.md) — GitHub Actions and Vercel integration
1156
+ - [MCP over HTTP Guide](docs/mcp-claude-code.md) — detailed HTTP MCP setup for Claude Code
1157
+ - [Troubleshooting](docs/troubleshooting.md) — common issues, diagnostics, and FAQ
1158
+
942
1159
  ## License
943
1160
 
944
1161
  MIT