searchsocket 0.4.0 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,34 +1,44 @@
1
1
  # SearchSocket
2
2
 
3
- Semantic site search and MCP retrieval for SvelteKit content projects.
3
+ Semantic site search and MCP retrieval for SvelteKit content projects. Index your site, search it from the browser or AI tools, and scroll users to the exact content they're looking for.
4
4
 
5
- **Requirements**: Node.js >= 20
5
+ **Requirements**: Node.js >= 20 | **Backend**: [Upstash Vector](https://upstash.com/docs/vector/overall/getstarted) | **License**: MIT
6
+
7
+ ## How it works
8
+
9
+ ```
10
+ SvelteKit Pages → Extractor (Cheerio + Turndown) → Chunker → Upstash Vector
11
+
12
+ Search UI ← SvelteKit API Hook ← Search Engine + Ranking
13
+
14
+ MCP Endpoint → Claude Code / Claude Desktop
15
+ ```
16
+
17
+ SearchSocket extracts content from your SvelteKit site, converts it to markdown, splits it into chunks, and stores them in Upstash Vector. At runtime, the SvelteKit hook serves both a search API for your frontend and an MCP endpoint for AI tools.
6
18
 
7
19
  ## Features
8
20
 
9
- - **Embeddings**: Jina AI `jina-embeddings-v5-text-small` with task-specific LoRA adapters (configurable)
10
- - **Vector Backend**: Turso/libSQL with vector search (local file DB for development, remote for production)
11
- - **Rerank**: Jina `jina-reranker-v3` enabled by default same API key
12
- - **Page Aggregation**: Group results by page with score-weighted chunk decay
13
- - **Meta Extraction**: Automatically extracts `<meta name="description">` and `<meta name="keywords">` for improved relevance
14
- - **SvelteKit Integrations**:
15
- - `searchsocketHandle()` for `POST /api/search` endpoint
16
- - `searchsocketVitePlugin()` for build-triggered indexing
17
- - **Client Library**: `createSearchClient()` for browser-side search
18
- - **MCP Server**: Model Context Protocol tools for search and page retrieval
19
- - **Git-Tracked Markdown Mirror**: Commit-safe deterministic markdown outputs
21
+ - **Semantic + keyword search** — Upstash Vector handles hybrid search with built-in reranking and input enrichment
22
+ - **Dual search** parallel page-level and chunk-level queries with configurable score blending
23
+ - **Scroll-to-text** auto-scroll to the matching section when a user clicks a search result, with CSS Highlight API and Text Fragment support
24
+ - **SvelteKit integration** server hook for the search API, Vite plugin for build-triggered indexing
25
+ - **Svelte 5 components** reactive `createSearch` store and `<SearchSocket>` metadata component
26
+ - **MCP server** — six tools for Claude Code, Claude Desktop, and other MCP clients (stdio + HTTP)
27
+ - **llms.txt generation** auto-generate LLM-friendly site indexes during indexing
28
+ - **Four source modes** — index from static output, build manifest, a running server, or raw markdown files
29
+ - **CLI** init, index, search, dev, status, doctor, clean, prune, test, mcp, add
20
30
 
21
31
  ## Install
22
32
 
23
33
  ```bash
24
- # pnpm
25
34
  pnpm add -D searchsocket
26
-
27
- # npm
28
- npm install -D searchsocket
29
35
  ```
30
36
 
31
- SearchSocket is typically a dev dependency for CLI indexing. If you use `searchsocketHandle()` at runtime (e.g., in a Node server adapter), add it as a regular dependency instead.
37
+ SearchSocket is typically a dev dependency since indexing runs at build time. If you use `searchsocketHandle()` at runtime (e.g., in a Node server adapter or serving the MCP endpoint from a production deployment), add it as a regular dependency:
38
+
39
+ ```bash
40
+ pnpm add searchsocket
41
+ ```
32
42
 
33
43
  ## Quickstart
34
44
 
@@ -38,100 +48,134 @@ SearchSocket is typically a dev dependency for CLI indexing. If you use `searchs
38
48
  pnpm searchsocket init
39
49
  ```
40
50
 
41
- This creates:
42
- - `searchsocket.config.ts` — minimal config file
43
- - `.searchsocket/` — state directory (added to `.gitignore`)
51
+ Creates `searchsocket.config.ts`, the `.searchsocket/` state directory, wires up your SvelteKit hooks and Vite config, and generates `.mcp.json` for Claude Code.
44
52
 
45
53
  ### 2. Configure
46
54
 
47
55
  Minimal config (`searchsocket.config.ts`):
48
56
 
49
57
  ```ts
50
- export default {
51
- embeddings: { apiKeyEnv: "JINA_API_KEY" }
52
- };
58
+ export default {};
53
59
  ```
54
60
 
55
- **That's it!** Turso defaults work out of the box:
56
- - **Development**: Uses local file DB at `.searchsocket/vectors.db`
57
- - **Production**: Set `TURSO_DATABASE_URL` and `TURSO_AUTH_TOKEN` to use remote Turso
61
+ That's it defaults handle the rest. SearchSocket reads `UPSTASH_VECTOR_REST_URL` and `UPSTASH_VECTOR_REST_TOKEN` from your environment automatically.
58
62
 
59
- ### 3. Add SvelteKit API Hook
63
+ ### 3. Set environment variables
60
64
 
61
- Create or update `src/hooks.server.ts`:
65
+ ```bash
66
+ # .env
67
+ UPSTASH_VECTOR_REST_URL=https://...
68
+ UPSTASH_VECTOR_REST_TOKEN=...
69
+ ```
70
+
71
+ Create an [Upstash Vector index](https://console.upstash.com/vector) with the `bge-large-en-v1.5` embedding model (1024 dimensions). Copy the REST URL and token.
72
+
73
+ ### 4. Add the SvelteKit hook
74
+
75
+ The `init` command does this for you, but if you need to do it manually:
62
76
 
63
77
  ```ts
78
+ // src/hooks.server.ts
64
79
  import { searchsocketHandle } from "searchsocket/sveltekit";
65
80
 
66
81
  export const handle = searchsocketHandle();
67
82
  ```
68
83
 
69
- This exposes `POST /api/search` with automatic scope resolution.
84
+ This exposes `POST /api/search`, `GET /api/search/health`, the MCP endpoint at `/api/mcp`, and page retrieval routes.
85
+
86
+ If you run into SSR bundling issues, mark SearchSocket as external in your Vite config:
87
+
88
+ ```ts
89
+ // vite.config.ts
90
+ export default defineConfig({
91
+ plugins: [sveltekit()],
92
+ ssr: {
93
+ external: ["searchsocket", "searchsocket/sveltekit", "searchsocket/client"]
94
+ }
95
+ });
96
+ ```
70
97
 
71
- ### 4. Set Environment Variables
98
+ ### 5. Add search to your frontend
72
99
 
73
- The CLI automatically loads `.env` from the working directory on startup, so your existing `.env` file works out of the box — no wrapper scripts or shell exports needed.
100
+ Copy the search dialog template into your project:
74
101
 
75
- Development (`.env`):
76
102
  ```bash
77
- JINA_API_KEY=jina_...
103
+ pnpm searchsocket add search-dialog
78
104
  ```
79
105
 
80
- Production (add these for remote Turso):
81
- ```bash
82
- JINA_API_KEY=jina_...
83
- TURSO_DATABASE_URL=libsql://your-db.turso.io
84
- TURSO_AUTH_TOKEN=eyJ...
106
+ This copies a Svelte 5 component to `src/lib/components/search/SearchDialog.svelte` with Cmd+K built in. Import it in your layout and add the scroll-to-text handler:
107
+
108
+ ```svelte
109
+ <!-- src/routes/+layout.svelte -->
110
+ <script>
111
+ import { afterNavigate } from "$app/navigation";
112
+ import { searchsocketScrollToText } from "searchsocket/sveltekit";
113
+ import SearchDialog from "$lib/components/search/SearchDialog.svelte";
114
+
115
+ afterNavigate(searchsocketScrollToText);
116
+ </script>
117
+
118
+ <SearchDialog />
119
+
120
+ <slot />
85
121
  ```
86
122
 
87
- ### 5. Index Your Content
123
+ Users can now press Cmd+K to search. See [Building a Search UI](docs/search-ui.md) for scoped search, custom styling, and more patterns.
124
+
125
+ ### 6. Deploy
126
+
127
+ SearchSocket is designed to index automatically on deploy. The `init` command already added the Vite plugin to your config. Set these environment variables on your hosting platform (Vercel, Cloudflare, etc.):
128
+
129
+ | Variable | Value |
130
+ |----------|-------|
131
+ | `UPSTASH_VECTOR_REST_URL` | Your Upstash Vector REST URL |
132
+ | `UPSTASH_VECTOR_REST_TOKEN` | Your Upstash Vector REST token |
133
+ | `SEARCHSOCKET_AUTO_INDEX` | `1` |
134
+
135
+ Every deploy will build your site, index the content, and serve the search API — fully automated.
136
+
137
+ For local testing, you can also build and index manually:
88
138
 
89
139
  ```bash
90
- pnpm searchsocket index --changed-only
140
+ pnpm build
141
+ pnpm searchsocket index
142
+ ```
143
+
144
+ ### 7. Connect Claude Code (optional)
145
+
146
+ Point Claude Code at your deployed site's MCP endpoint:
147
+
148
+ ```json
149
+ {
150
+ "mcpServers": {
151
+ "searchsocket": {
152
+ "type": "http",
153
+ "url": "https://your-site.com/api/mcp"
154
+ }
155
+ }
156
+ }
91
157
  ```
92
158
 
93
- SearchSocket auto-detects the source mode based on your config:
94
- - **`static-output`** (default): Reads prerendered HTML from `build/`
95
- - **`build`**: Discovers routes from SvelteKit build manifest and renders via preview server
96
- - **`crawl`**: Fetches pages from a running HTTP server
97
- - **`content-files`**: Reads markdown/svelte source files directly
159
+ See [MCP Server](#mcp-server) for authentication and other options.
98
160
 
99
- The indexing pipeline:
100
- - Extracts content from `<main>` (configurable), including `<meta>` description and keywords
101
- - Chunks text with semantic heading boundaries
102
- - Prepends page title to each chunk for embedding context
103
- - Generates a synthetic summary chunk per page for identity matching
104
- - Generates embeddings via Jina AI (with task-specific LoRA adapters for indexing vs search)
105
- - Stores vectors in Turso/libSQL with cosine similarity index
161
+ ### Querying the API directly
106
162
 
107
- ### 6. Query
163
+ The search API is also available via HTTP and CLI:
108
164
 
109
- **Via API:**
110
165
  ```bash
166
+ # cURL
111
167
  curl -X POST http://localhost:5173/api/search \
112
168
  -H "content-type: application/json" \
113
169
  -d '{"q":"getting started","topK":5,"groupBy":"page"}'
114
- ```
115
-
116
- **Via client library:**
117
- ```ts
118
- import { createSearchClient } from "searchsocket/client";
119
170
 
120
- const client = createSearchClient(); // defaults to /api/search
121
- const response = await client.search({
122
- q: "getting started",
123
- topK: 5,
124
- groupBy: "page",
125
- pathPrefix: "/docs"
126
- });
171
+ # CLI
172
+ pnpm searchsocket search --q "getting started" --top-k 5
127
173
  ```
128
174
 
129
- **Via CLI:**
130
- ```bash
131
- pnpm searchsocket search --q "getting started" --top-k 5 --path-prefix /docs
132
- ```
175
+ ### Response format
176
+
177
+ With `groupBy: "page"` (the default):
133
178
 
134
- **Response** (with `groupBy: "page"`, the default):
135
179
  ```json
136
180
  {
137
181
  "q": "getting started",
@@ -161,18 +205,16 @@ pnpm searchsocket search --q "getting started" --top-k 5 --path-prefix /docs
161
205
  }
162
206
  ],
163
207
  "meta": {
164
- "timingsMs": { "embed": 120, "vector": 15, "rerank": 0, "total": 135 },
165
- "usedRerank": false,
166
- "modelId": "jina-embeddings-v5-text-small"
208
+ "timingsMs": { "total": 135 }
167
209
  }
168
210
  }
169
211
  ```
170
212
 
171
- The `chunks` array appears when a page has multiple matching chunks above the `minChunkScoreRatio` threshold. Use `groupBy: "chunk"` for flat per-chunk results without page aggregation.
213
+ The `chunks` array contains matching sections within each page. Use `groupBy: "chunk"` for flat per-chunk results without page aggregation.
172
214
 
173
215
  ## Source Modes
174
216
 
175
- SearchSocket supports four source modes for loading pages to index.
217
+ SearchSocket supports four ways to load your site content for indexing.
176
218
 
177
219
  ### `static-output` (default)
178
220
 
@@ -182,50 +224,37 @@ Reads prerendered HTML files from SvelteKit's build output directory.
182
224
  export default {
183
225
  source: {
184
226
  mode: "static-output",
185
- staticOutputDir: "build"
227
+ staticOutputDir: "build" // default
186
228
  }
187
229
  };
188
230
  ```
189
231
 
190
- Best for: Sites with fully prerendered pages. Run `vite build` first, then index.
232
+ Best for fully prerendered sites. Run `vite build` first, then `searchsocket index`.
191
233
 
192
234
  ### `build`
193
235
 
194
- Discovers routes automatically from SvelteKit's build manifest and renders them via an ephemeral `vite preview` server. No manual route configuration needed.
236
+ Discovers routes from SvelteKit's build manifest and renders via an ephemeral `vite preview` server. No manual route lists needed.
195
237
 
196
238
  ```ts
197
239
  export default {
198
240
  source: {
241
+ mode: "build",
199
242
  build: {
200
- outputDir: ".svelte-kit/output", // default
201
- previewTimeout: 30000, // ms to wait for server (default)
202
- exclude: ["/api/*", "/admin/*"], // glob patterns to skip
203
- paramValues: { // values for dynamic routes
243
+ exclude: ["/api/*", "/admin/*"],
244
+ paramValues: {
204
245
  "/blog/[slug]": ["hello-world", "getting-started"],
205
246
  "/docs/[category]/[page]": ["guides/quickstart", "api/search"]
206
247
  },
207
- discover: true, // crawl internal links to find pages (default: false)
208
- seedUrls: ["/"], // starting URLs for discovery
209
- maxPages: 200, // max pages to discover (default: 200)
210
- maxDepth: 5 // max link depth from seed URLs (default: 5)
248
+ discover: true, // crawl internal links to find more pages
249
+ seedUrls: ["/"],
250
+ maxPages: 200,
251
+ maxDepth: 5
211
252
  }
212
253
  }
213
254
  };
214
255
  ```
215
256
 
216
- Best for: CI/CD pipelines. Enables `vite build && searchsocket index` with zero route configuration.
217
-
218
- **How it works**:
219
- 1. Parses `.svelte-kit/output/server/manifest-full.js` to discover all page routes
220
- 2. Expands dynamic routes using `paramValues` (skips dynamic routes without values)
221
- 3. Starts an ephemeral `vite preview` server on a random port
222
- 4. Fetches all routes concurrently for SSR-rendered HTML
223
- 5. Provides exact route-to-file mapping (no heuristic matching needed)
224
- 6. Shuts down the preview server
225
-
226
- **Dynamic routes**: Each key in `paramValues` maps to a route ID (e.g., `/blog/[slug]`) or its URL equivalent. Each value in the array replaces all `[param]` segments in the URL. Routes with layout groups like `/(app)/blog/[slug]` also match the URL key `/blog/[slug]`.
227
-
228
- **Link discovery**: Enable `discover: true` to automatically find pages by crawling internal links from `seedUrls`. This is useful when dynamic routes have many parameter values that are impractical to enumerate. The crawler respects `maxPages` and `maxDepth` limits and only follows links within the same origin.
257
+ Best for CI/CD pipelines: `vite build && searchsocket index` with zero route configuration.
229
258
 
230
259
  ### `crawl`
231
260
 
@@ -234,24 +263,24 @@ Fetches pages from a running HTTP server.
234
263
  ```ts
235
264
  export default {
236
265
  source: {
266
+ mode: "crawl",
237
267
  crawl: {
238
268
  baseUrl: "http://localhost:4173",
239
- routes: ["/", "/docs", "/blog"], // explicit routes
240
- sitemapUrl: "https://example.com/sitemap.xml" // or discover via sitemap
269
+ routes: ["/", "/docs", "/blog"],
270
+ sitemapUrl: "https://example.com/sitemap.xml"
241
271
  }
242
272
  }
243
273
  };
244
274
  ```
245
275
 
246
- If `routes` is omitted and no `sitemapUrl` is set, defaults to crawling `["/"]` only.
247
-
248
276
  ### `content-files`
249
277
 
250
- Reads markdown and svelte source files directly, without building or serving.
278
+ Reads markdown and Svelte source files directly, without building or serving.
251
279
 
252
280
  ```ts
253
281
  export default {
254
282
  source: {
283
+ mode: "content-files",
255
284
  contentFiles: {
256
285
  globs: ["src/routes/**/*.md", "content/**/*.md"],
257
286
  baseDir: "."
@@ -262,541 +291,764 @@ export default {
262
291
 
263
292
  ## Client Library
264
293
 
265
- SearchSocket exports a lightweight client for browser-side search:
294
+ ### `createSearchClient(options?)`
295
+
296
+ Lightweight browser-side search client.
266
297
 
267
298
  ```ts
268
299
  import { createSearchClient } from "searchsocket/client";
269
300
 
270
301
  const client = createSearchClient({
271
- endpoint: "/api/search", // default
272
- fetchImpl: fetch // default; override for SSR or testing
302
+ endpoint: "/api/search", // default
303
+ fetchImpl: fetch // override for SSR or testing
273
304
  });
274
305
 
275
- const response = await client.search({
306
+ const { results } = await client.search({
276
307
  q: "deployment guide",
277
308
  topK: 8,
278
309
  groupBy: "page",
279
310
  pathPrefix: "/docs",
280
311
  tags: ["guide"],
281
- rerank: true
312
+ filters: { version: 2 },
313
+ maxSubResults: 3
282
314
  });
283
-
284
- for (const result of response.results) {
285
- console.log(result.url, result.title, result.score);
286
- if (result.chunks) {
287
- for (const chunk of result.chunks) {
288
- console.log(" ", chunk.sectionTitle, chunk.score);
289
- }
290
- }
291
- }
292
315
  ```
293
316
 
294
- ## Vector Backend: Turso/libSQL
295
-
296
- SearchSocket uses **Turso** (libSQL) as its single vector backend, providing a unified experience across development and production.
297
-
298
- ### Local Development
299
-
300
- By default, SearchSocket uses a **local file database**:
301
- - Path: `.searchsocket/vectors.db` (configurable)
302
- - No account or API keys needed
303
- - Full vector search with `libsql_vector_idx` and `vector_top_k`
304
- - Perfect for local development and CI testing
305
-
306
- ### Production (Remote Turso)
307
-
308
- For production, switch to **Turso's hosted service**:
309
-
310
- 1. **Sign up for Turso** (free tier available):
311
- ```bash
312
- # Install Turso CLI
313
- brew install tursodatabase/tap/turso
314
-
315
- # Sign up
316
- turso auth signup
317
+ ### `buildResultUrl(result)`
317
318
 
318
- # Create a database
319
- turso db create searchsocket-prod
319
+ Builds a URL from a search result that includes scroll-to-text metadata:
320
320
 
321
- # Get credentials
322
- turso db show searchsocket-prod --url
323
- turso db tokens create searchsocket-prod
324
- ```
325
-
326
- 2. **Set environment variables**:
327
- ```bash
328
- TURSO_DATABASE_URL=libsql://searchsocket-prod-xxx.turso.io
329
- TURSO_AUTH_TOKEN=eyJhbGc...
330
- ```
331
-
332
- 3. **Index normally** — SearchSocket auto-detects the remote URL and uses it.
333
-
334
- ### Direct Credential Passing
335
-
336
- Instead of environment variables, you can pass credentials directly in the config. This is useful for serverless deployments or multi-tenant setups:
321
+ - `_ssk` query parameter — section title for SvelteKit client-side navigation
322
+ - `_sskt` query parameter — text target snippet for precise scroll
323
+ - `#:~:text=` [Text Fragment](https://developer.mozilla.org/en-US/docs/Web/URI/Fragment/Text_fragments) for native browser scroll on full page loads
337
324
 
338
325
  ```ts
339
- export default {
340
- embeddings: {
341
- apiKey: "jina_..." // direct API key (takes precedence over apiKeyEnv)
342
- },
343
- vector: {
344
- turso: {
345
- url: "libsql://my-db.turso.io", // direct URL
346
- authToken: "eyJhbGc..." // direct auth token
347
- }
348
- }
349
- };
350
- ```
326
+ import { buildResultUrl } from "searchsocket/client";
351
327
 
352
- Direct values take precedence over environment variable lookups (`apiKeyEnv`, `urlEnv`, `authTokenEnv`).
353
-
354
- ### Dimension Mismatch Auto-Recovery
355
-
356
- When switching embedding models (e.g., from a 1536-dim model to Jina's 1024-dim), the vector dimension changes. SearchSocket automatically detects this and recreates the chunks table with the new dimension — no manual intervention needed. A full re-index (`--force`) is still required after switching models.
328
+ const href = buildResultUrl(result);
329
+ // "/docs/getting-started?_ssk=Installation&_sskt=Install+with+pnpm#:~:text=Install%20with%20pnpm"
330
+ ```
357
331
 
358
- ### Why Turso?
332
+ ## Svelte 5 Integration
333
+
334
+ ### `createSearch(options?)`
335
+
336
+ A reactive search store built on Svelte 5 runes with debouncing and LRU caching.
337
+
338
+ ```svelte
339
+ <script>
340
+ import { createSearch } from "searchsocket/svelte";
341
+ import { buildResultUrl } from "searchsocket/client";
342
+
343
+ const search = createSearch({
344
+ endpoint: "/api/search",
345
+ debounce: 250, // ms (default)
346
+ cache: true, // LRU result caching (default)
347
+ cacheSize: 50, // max cached queries (default)
348
+ topK: 10,
349
+ groupBy: "page",
350
+ pathPrefix: "/docs" // scope search to a section
351
+ });
352
+ </script>
353
+
354
+ <input bind:value={search.query} placeholder="Search docs..." />
355
+
356
+ {#if search.loading}
357
+ <p>Searching...</p>
358
+ {/if}
359
+
360
+ {#if search.error}
361
+ <p class="error">{search.error.message}</p>
362
+ {/if}
363
+
364
+ {#each search.results as result}
365
+ <a href={buildResultUrl(result)}>
366
+ <strong>{result.title}</strong>
367
+ {#if result.sectionTitle}
368
+ <span>— {result.sectionTitle}</span>
369
+ {/if}
370
+ </a>
371
+ <p>{result.snippet}</p>
372
+ {/each}
373
+ ```
359
374
 
360
- - **Single backend** one unified Turso/libSQL store for vectors, metadata, and state
361
- - **Local-first development** — zero external dependencies for local dev
362
- - **Production-ready** — same codebase scales to remote hosted DB
363
- - **Cost-effective** — Turso free tier includes 9GB storage, 500M row reads/month
364
- - **Vector search native** — `F32_BLOB` vectors, cosine similarity index, `vector_top_k` ANN queries
375
+ Call `search.destroy()` to clean up when no longer needed (automatic in component context).
365
376
 
366
- ## Serverless Deployment (Vercel, Netlify, etc.)
377
+ ### `<SearchSocket>` component
367
378
 
368
- SearchSocket works on serverless platforms with a few adjustments:
379
+ Declarative meta tag component for controlling per-page search behavior:
369
380
 
370
- ### Requirements
381
+ ```svelte
382
+ <script>
383
+ import { SearchSocket } from "searchsocket/svelte";
384
+ </script>
371
385
 
372
- 1. **Remote Turso database** local SQLite is not available in serverless (no persistent filesystem). Set `TURSO_DATABASE_URL` and `TURSO_AUTH_TOKEN` as platform environment variables.
386
+ <!-- Boost this page's search ranking -->
387
+ <SearchSocket weight={1.2} />
373
388
 
374
- 2. **Inline config via `rawConfig`** — the default config loader uses `jiti` to import `searchsocket.config.ts` from disk, which isn't bundled in serverless. Use `rawConfig` to pass config inline:
389
+ <!-- Exclude from search -->
390
+ <SearchSocket noindex />
375
391
 
376
- ```ts
377
- // hooks.server.ts (Vercel / Netlify)
378
- import { searchsocketHandle } from "searchsocket/sveltekit";
392
+ <!-- Add filterable tags -->
393
+ <SearchSocket tags={["guide", "advanced"]} />
379
394
 
380
- export const handle = searchsocketHandle({
381
- rawConfig: {
382
- project: { id: "my-docs-site" },
383
- source: { mode: "static-output" },
384
- embeddings: { apiKeyEnv: "JINA_API_KEY" },
385
- }
386
- });
395
+ <!-- Add structured metadata (filterable via search API) -->
396
+ <SearchSocket meta={{ version: 2, category: "api" }} />
387
397
  ```
388
398
 
389
- 3. **Environment variables** set these on your platform dashboard:
390
- - `JINA_API_KEY`
391
- - `TURSO_DATABASE_URL`
392
- - `TURSO_AUTH_TOKEN`
399
+ The component renders `<meta>` tags in `<svelte:head>` that SearchSocket reads during indexing.
393
400
 
394
- ### Rate Limiting
401
+ ### Template components
395
402
 
396
- The built-in `InMemoryRateLimiter` auto-disables on serverless platforms (it resets on every cold start). Use your platform's WAF or edge rate-limiting instead.
403
+ Copy ready-made search UI components into your project:
397
404
 
398
- ### What Only Applies to Indexing
405
+ ```bash
406
+ pnpm searchsocket add search-dialog
407
+ pnpm searchsocket add search-input
408
+ pnpm searchsocket add search-results
409
+ ```
399
410
 
400
- The following features are only used during `searchsocket index` (CLI), not the search handler:
401
- - `ensureStateDirs` — creates `.searchsocket/` state directories
402
- - Markdown mirror — writes `.searchsocket/mirror/` files
403
- - Local SQLite fallback — only needed when `TURSO_DATABASE_URL` is not set
411
+ These are Svelte 5 components copied to `src/lib/components/search/` (configurable via `--dir`). They're starting points to customize, not dependencies.
404
412
 
405
- ### Adapter Guidance
413
+ ## Scroll-to-Text Navigation
406
414
 
407
- | Platform | Adapter | Notes |
408
- |----------|---------|-------|
409
- | Vercel | `adapter-auto` (default) | Serverless — use `rawConfig` + remote Turso |
410
- | Netlify | `adapter-netlify` | Serverless — same as Vercel |
411
- | VPS / Docker | `adapter-node` | Long-lived process — no limitations, local SQLite works |
415
+ When a user clicks a search result, SearchSocket scrolls them to the matching section on the destination page.
412
416
 
413
- ## Embeddings: Jina AI
417
+ ### Setup
414
418
 
415
- SearchSocket uses **Jina AI's embedding models** to convert text into semantic vectors. A single `JINA_API_KEY` powers both embeddings and optional reranking.
419
+ Add the scroll handler to your root layout:
416
420
 
417
- ### Default Model
421
+ ```svelte
422
+ <!-- src/routes/+layout.svelte -->
423
+ <script>
424
+ import { afterNavigate } from '$app/navigation';
425
+ import { searchsocketScrollToText } from 'searchsocket/sveltekit';
418
426
 
419
- - **Model**: `jina-embeddings-v5-text-small`
420
- - **Dimensions**: 1024 (default)
421
- - **Cost**: ~$0.00005 per 1K tokens
422
- - **Task adapters**: Uses `retrieval.passage` for indexing, `retrieval.query` for search queries (LoRA task-specific adapters for better retrieval quality)
427
+ afterNavigate(searchsocketScrollToText);
428
+ </script>
429
+ ```
423
430
 
424
- ### How It Works
431
+ ### How it works
425
432
 
426
- 1. **Chunking**: Text is split into semantic chunks (default 2200 chars, 200 overlap)
427
- 2. **Title Prepend**: Page title is prepended to each chunk for better context (`chunking.prependTitle`, default: true)
428
- 3. **Summary Chunk**: A synthetic identity chunk is generated per page with title, URL, and first paragraph (`chunking.pageSummaryChunk`, default: true)
429
- 4. **Embedding**: Each chunk is sent to Jina's embedding API with the `retrieval.passage` task adapter
430
- 5. **Batching**: Requests batched (64 texts per request) for efficiency
431
- 6. **Storage**: Vectors stored in Turso with metadata (URL, title, tags, depth, etc.)
433
+ 1. `buildResultUrl()` encodes the section title and text snippet into the URL
434
+ 2. On SvelteKit client-side navigation, the `afterNavigate` hook reads `_ssk`/`_sskt` params
435
+ 3. A TreeWalker-based text mapper finds the exact position in the DOM
436
+ 4. The page scrolls smoothly to the match
437
+ 5. The matching text is highlighted using the [CSS Custom Highlight API](https://developer.mozilla.org/en-US/docs/Web/API/CSS_Custom_Highlight_API) (with a DOM fallback for older browsers)
438
+ 6. On full page loads, browsers that support Text Fragments (`#:~:text=`) handle scrolling natively
432
439
 
433
- ### Cost Estimation
440
+ The highlight fades after 2 seconds. Customize with CSS:
434
441
 
435
- Use `--dry-run` to preview costs:
436
- ```bash
437
- pnpm searchsocket index --dry-run
442
+ ```css
443
+ ::highlight(ssk-highlight) {
444
+ background-color: rgba(250, 204, 21, 0.4);
445
+ }
438
446
  ```
439
447
 
440
- Output:
441
- ```
442
- pages processed: 42
443
- chunks total: 156
444
- chunks changed: 156
445
- embeddings created: 156
446
- estimated tokens: 32,400
447
- estimated cost (USD): $0.000648
448
- ```
448
+ ## Search & Ranking
449
449
 
450
- ### Reranking
450
+ ### Dual search
451
451
 
452
- Since embeddings and reranking share the same Jina API key, enabling reranking is one boolean:
452
+ By default, SearchSocket runs two parallel queries one against page-level summaries and one against individual chunks — then blends the scores:
453
453
 
454
454
  ```ts
455
455
  export default {
456
- embeddings: { apiKeyEnv: "JINA_API_KEY" },
457
- rerank: { enabled: true }
456
+ search: {
457
+ dualSearch: true, // default
458
+ pageSearchWeight: 0.3 // weight of page results vs chunks (0-1)
459
+ }
458
460
  };
459
461
  ```
460
462
 
461
- **Note**: Changing the model after indexing requires re-indexing with `--force`.
462
-
463
- ## Search & Ranking
464
-
465
- ### Page Aggregation
463
+ ### Page aggregation
466
464
 
467
- By default (`groupBy: "page"`), SearchSocket groups chunk results by page URL and computes a page-level score:
465
+ With `groupBy: "page"` (default), chunk results are grouped by page URL:
468
466
 
469
467
  1. The top chunk score becomes the base page score
470
- 2. Additional matching chunks contribute a decaying bonus: `chunk_score * decay^i`
471
- 3. Optional per-URL page weights are applied multiplicatively
468
+ 2. Additional matching chunks add a decaying bonus: `chunk_score * decay^i`
469
+ 3. Per-URL page weights are applied multiplicatively
472
470
 
473
- Configure aggregation behavior:
471
+ ### Ranking configuration
474
472
 
475
473
  ```ts
476
474
  export default {
477
475
  ranking: {
478
- minScore: 0, // minimum absolute score to include in results (default: 0, disabled)
479
- aggregationCap: 5, // max chunks contributing to page score (default: 5)
480
- aggregationDecay: 0.5, // decay factor for additional chunks (default: 0.5)
481
- minChunkScoreRatio: 0.5, // threshold for sub-chunks in results (default: 0.5)
482
- pageWeights: { // per-URL score multipliers
483
- "/": 1.1,
476
+ enableIncomingLinkBoost: true, // boost pages with more internal links pointing to them
477
+ enableDepthBoost: true, // boost shallower pages (/ > /docs > /docs/api)
478
+ enableFreshnessBoost: false, // boost recently published content
479
+ enableAnchorTextBoost: false, // boost pages whose link text matches the query
480
+
481
+ pageWeights: { // per-URL score multipliers (prefix matching)
482
+ "/": 0.95,
484
483
  "/docs": 1.15,
485
- "/download": 1.2
484
+ "/download": 1.05
486
485
  },
486
+
487
+ aggregationCap: 5, // max chunks contributing to page score
488
+ aggregationDecay: 0.5, // decay for additional chunks
489
+ minScoreRatio: 0.70, // drop results below 70% of best score
490
+ scoreGapThreshold: 0.4, // trim results >40% below best
491
+ minChunkScoreRatio: 0.5, // threshold for sub-chunks
492
+
487
493
  weights: {
488
- aggregation: 0.1, // weight of aggregation bonus (default: 0.1)
489
- incomingLinks: 0.05, // incoming link boost weight (default: 0.05)
490
- depth: 0.03, // URL depth boost weight (default: 0.03)
491
- rerank: 1.0 // reranker score weight (default: 1.0)
494
+ incomingLinks: 0.05,
495
+ depth: 0.03,
496
+ aggregation: 0.1,
497
+ titleMatch: 0.15,
498
+ freshness: 0.1,
499
+ anchorText: 0.10
492
500
  }
493
501
  }
494
502
  };
495
503
  ```
496
504
 
497
- `pageWeights` supports exact URL matches and prefix matching. A weight of `1.15` on `"/docs"` boosts all pages under `/docs/` by 15%. Use gentle values (1.05-1.2x) since they compound with aggregation.
498
-
499
- `minScore` filters out low-relevance results before they reach the client. Set to a value like `0.3` to remove noise. In page mode, pages below the threshold are dropped; in chunk mode, individual chunks are filtered. Default is `0` (disabled).
500
-
501
- ### Chunk Mode
502
-
503
- Use `groupBy: "chunk"` for flat per-chunk results without page aggregation:
504
-
505
- ```bash
506
- curl -X POST http://localhost:5173/api/search \
507
- -H "content-type: application/json" \
508
- -d '{"q":"vector search","topK":10,"groupBy":"chunk"}'
509
- ```
505
+ Use gentle `pageWeights` values (0.9–1.2) since they compound with other boosts.
510
506
 
511
507
  ## Build-Triggered Indexing
512
508
 
513
- Automatically index after each SvelteKit build.
509
+ The recommended workflow is to index automatically on every deploy. Add the Vite plugin to your config:
514
510
 
515
- **`vite.config.ts` or `svelte.config.js`:**
516
511
  ```ts
512
+ // vite.config.ts
513
+ import { sveltekit } from "@sveltejs/kit/vite";
517
514
  import { searchsocketVitePlugin } from "searchsocket/sveltekit";
518
515
 
519
516
  export default {
520
517
  plugins: [
521
- svelteKitPlugin(),
518
+ sveltekit(),
522
519
  searchsocketVitePlugin({
523
- enabled: true, // or check process.env.SEARCHSOCKET_AUTO_INDEX
524
- changedOnly: true, // incremental indexing (faster)
525
- verbose: false
520
+ changedOnly: true, // incremental indexing (default)
521
+ verbose: true
526
522
  })
527
523
  ]
528
524
  };
529
525
  ```
530
526
 
531
- **Environment control:**
527
+ ### Vercel / Cloudflare / Netlify
528
+
529
+ Set these environment variables in your hosting platform:
530
+
531
+ | Variable | Value |
532
+ |----------|-------|
533
+ | `UPSTASH_VECTOR_REST_URL` | Your Upstash Vector REST URL |
534
+ | `UPSTASH_VECTOR_REST_TOKEN` | Your Upstash Vector REST token |
535
+ | `SEARCHSOCKET_AUTO_INDEX` | `1` |
536
+
537
+ Every deploy will build your site, index the content into Upstash, and serve the search API and MCP endpoint — fully automated.
538
+
539
+ ### Environment variable control
540
+
532
541
  ```bash
533
- # Enable via env var
542
+ # Enable indexing on build
534
543
  SEARCHSOCKET_AUTO_INDEX=1 pnpm build
535
544
 
536
- # Disable via env var
545
+ # Disable temporarily
537
546
  SEARCHSOCKET_DISABLE_AUTO_INDEX=1 pnpm build
547
+
548
+ # Force full rebuild (ignore incremental cache)
549
+ SEARCHSOCKET_FORCE_REINDEX=1 pnpm build
550
+ ```
551
+
552
+ ## Making Images Searchable
553
+
554
+ SearchSocket converts images to text during extraction using this priority chain:
555
+
556
+ 1. `data-search-description` on the `<img>` — your explicit description
557
+ 2. `data-search-description` on the parent `<figure>`
558
+ 3. `alt` text + `<figcaption>` combined
559
+ 4. `alt` text alone (filters generic words like "image", "icon")
560
+ 5. `<figcaption>` alone
561
+ 6. Removed — images with no useful text are dropped
562
+
563
+ ```html
564
+ <img
565
+ src="/screenshots/settings.png"
566
+ alt="Settings page"
567
+ data-search-description="The settings page showing API key configuration, theme selection, and notification preferences"
568
+ />
569
+ ```
570
+
571
+ Works with SvelteKit's `enhanced:img`:
572
+
573
+ ```svelte
574
+ <enhanced:img
575
+ src="./screenshots/dashboard.png"
576
+ alt="Dashboard"
577
+ data-search-description="Main dashboard showing active projects and indexing status"
578
+ />
579
+ ```
580
+
581
+ ## MCP Server
582
+
583
+ SearchSocket includes an MCP server that gives Claude Code, Claude Desktop, and other MCP clients direct access to your site's search index. The MCP endpoint is built into `searchsocketHandle()` — once your site is deployed, any MCP client can connect to it over HTTP.
584
+
585
+ ### Available tools
586
+
587
+ | Tool | Description |
588
+ |------|-------------|
589
+ | `search` | Semantic search with filtering, grouping, and reranking |
590
+ | `get_page` | Retrieve full page markdown with frontmatter |
591
+ | `list_pages` | Cursor-paginated page listing |
592
+ | `get_site_structure` | Hierarchical page tree |
593
+ | `find_source_file` | Locate the SvelteKit source file for content |
594
+ | `get_related_pages` | Find related pages by links, semantics, and structure |
595
+
596
+ ### Connecting to your deployed site
597
+
598
+ The recommended setup is to connect Claude Code to your deployed site's MCP endpoint. This way the index stays up to date automatically as you deploy, and there's no local process to manage.
599
+
600
+ Add `.mcp.json` to your project root:
601
+
602
+ ```json
603
+ {
604
+ "mcpServers": {
605
+ "searchsocket": {
606
+ "type": "http",
607
+ "url": "https://your-site.com/api/mcp"
608
+ }
609
+ }
610
+ }
538
611
  ```
539
612
 
540
- ## Git-Tracked Markdown Mirror
613
+ That's it. Restart Claude Code and the six search tools are available. You can search your docs, retrieve page content, and find source files directly from the AI assistant.
541
614
 
542
- Indexing writes a **deterministic markdown mirror**:
615
+ To protect the endpoint, add API key authentication:
543
616
 
617
+ ```ts
618
+ // src/hooks.server.ts
619
+ export const handle = searchsocketHandle({
620
+ rawConfig: {
621
+ mcp: {
622
+ handle: {
623
+ apiKey: process.env.SEARCHSOCKET_MCP_API_KEY
624
+ }
625
+ }
626
+ }
627
+ });
544
628
  ```
545
- .searchsocket/pages/<scope>/<path>.md
629
+
630
+ Then pass the key in `.mcp.json`:
631
+
632
+ ```json
633
+ {
634
+ "mcpServers": {
635
+ "searchsocket": {
636
+ "type": "http",
637
+ "url": "https://your-site.com/api/mcp",
638
+ "headers": {
639
+ "Authorization": "Bearer ${SEARCHSOCKET_MCP_API_KEY}"
640
+ }
641
+ }
642
+ }
643
+ }
644
+ ```
645
+
646
+ The `${SEARCHSOCKET_MCP_API_KEY}` syntax references an environment variable so you don't hardcode secrets in `.mcp.json`.
647
+
648
+ ### Auto-approving in Claude Code
649
+
650
+ Skip the approval prompt each time a tool is called:
651
+
652
+ ```json
653
+ {
654
+ "allowedMcpServers": [
655
+ { "serverName": "searchsocket" }
656
+ ]
657
+ }
546
658
  ```
547
659
 
548
- Example:
660
+ Add this to `.claude/settings.json` in your project.
661
+
662
+ ### Local development
663
+
664
+ During local development, you can point to your dev server instead:
665
+
666
+ ```json
667
+ {
668
+ "mcpServers": {
669
+ "searchsocket": {
670
+ "type": "http",
671
+ "url": "http://localhost:5173/api/mcp"
672
+ }
673
+ }
674
+ }
549
675
  ```
550
- .searchsocket/pages/main/docs/intro.md
676
+
677
+ ### Claude Desktop
678
+
679
+ Add to `~/Library/Application Support/Claude/claude_desktop_config.json`:
680
+
681
+ ```json
682
+ {
683
+ "mcpServers": {
684
+ "searchsocket": {
685
+ "command": "npx",
686
+ "args": ["searchsocket", "mcp"],
687
+ "cwd": "/path/to/your/project"
688
+ }
689
+ }
690
+ }
551
691
  ```
552
692
 
553
- Each file contains:
554
- - Frontmatter: URL, title, scope, route file, metadata
555
- - Markdown: Extracted content
693
+ ### Standalone HTTP server
556
694
 
557
- **Why commit it?**
558
- - Content workflows (edit markdown, regenerate embeddings)
559
- - Version control for indexed content
560
- - Debugging (see exactly what was indexed)
561
- - Offline search (grep the mirror)
695
+ Run the MCP server as a standalone process (outside SvelteKit):
562
696
 
563
- Add to `.gitignore` if you don't need it:
697
+ ```bash
698
+ pnpm searchsocket mcp --transport http --port 3338
564
699
  ```
565
- .searchsocket/pages/
700
+
701
+ ## llms.txt Generation
702
+
703
+ Generate [llms.txt](https://llmstxt.org/) files during indexing — a standardized way to make your site content available to LLMs.
704
+
705
+ ```ts
706
+ export default {
707
+ project: {
708
+ baseUrl: "https://example.com"
709
+ },
710
+ llmsTxt: {
711
+ enable: true,
712
+ title: "My Project",
713
+ description: "Documentation for My Project",
714
+ outputPath: "static/llms.txt", // default
715
+ generateFull: true, // also generate llms-full.txt
716
+ serveMarkdownVariants: false // serve /page.md variants via the hook
717
+ }
718
+ };
566
719
  ```
567
720
 
568
- ## Commands
721
+ After indexing, `llms.txt` (page index with links) and `llms-full.txt` (full content) are written to your static directory and served by `searchsocketHandle()`.
722
+
723
+ ## CLI Commands
569
724
 
570
725
  ### `searchsocket init`
571
726
 
572
- Initialize config and state directory.
727
+ Initialize config and state directory. Creates `searchsocket.config.ts`, `.searchsocket/`, `.mcp.json`, and wires up your hooks and Vite config.
573
728
 
574
729
  ```bash
575
730
  pnpm searchsocket init
731
+ pnpm searchsocket init --non-interactive
576
732
  ```
577
733
 
578
734
  ### `searchsocket index`
579
735
 
580
- Index content into vectors.
736
+ Index content into Upstash Vector.
581
737
 
582
738
  ```bash
583
- # Incremental (only changed chunks)
584
- pnpm searchsocket index --changed-only
739
+ pnpm searchsocket index # incremental (default: --changed-only)
740
+ pnpm searchsocket index --force # full re-index
741
+ pnpm searchsocket index --source build # override source mode
742
+ pnpm searchsocket index --scope staging # override scope
743
+ pnpm searchsocket index --dry-run # preview without writing
744
+ pnpm searchsocket index --max-pages 10 # limit for testing
745
+ pnpm searchsocket index --verbose # detailed output
746
+ pnpm searchsocket index --json # machine-readable output
747
+ ```
585
748
 
586
- # Full re-index
587
- pnpm searchsocket index --force
749
+ ### `searchsocket search`
588
750
 
589
- # Preview cost without indexing
590
- pnpm searchsocket index --dry-run
751
+ CLI search for testing.
591
752
 
592
- # Override source mode
593
- pnpm searchsocket index --source build
753
+ ```bash
754
+ pnpm searchsocket search --q "getting started" --top-k 5
755
+ pnpm searchsocket search --q "api" --path-prefix /docs
756
+ ```
594
757
 
595
- # Limit for testing
596
- pnpm searchsocket index --max-pages 10 --max-chunks 50
758
+ ### `searchsocket dev`
597
759
 
598
- # Override scope
599
- pnpm searchsocket index --scope staging
760
+ Watch for file changes and auto-reindex, with optional playground UI.
600
761
 
601
- # Verbose output
602
- pnpm searchsocket index --verbose
762
+ ```bash
763
+ pnpm searchsocket dev # watch + playground at :3337
764
+ pnpm searchsocket dev --mcp --mcp-port 3338 # also start MCP HTTP server
765
+ pnpm searchsocket dev --no-playground # watch only
603
766
  ```
604
767
 
605
768
  ### `searchsocket status`
606
769
 
607
- Show indexing status, scope, and vector health.
770
+ Show indexing status and backend health.
608
771
 
609
772
  ```bash
610
773
  pnpm searchsocket status
774
+ ```
775
+
776
+ ### `searchsocket doctor`
777
+
778
+ Validate config, env vars, provider connectivity, and write access.
611
779
 
612
- # Output:
613
- # project: my-site
614
- # resolved scope: main
615
- # embedding model: jina-embeddings-v5-text-small
616
- # vector backend: turso/libsql (local (.searchsocket/vectors.db))
617
- # vector health: ok
618
- # last indexed (main): 2025-02-23T10:30:00Z
619
- # tracked chunks: 156
620
- # last estimated tokens: 32,400
621
- # last estimated cost: $0.000648
780
+ ```bash
781
+ pnpm searchsocket doctor
622
782
  ```
623
783
 
624
- ### `searchsocket dev`
784
+ ### `searchsocket test`
625
785
 
626
- Watch for file changes and auto-reindex.
786
+ Run search quality assertions against the live index.
627
787
 
628
788
  ```bash
629
- pnpm searchsocket dev
789
+ pnpm searchsocket test # uses searchsocket.test.json
790
+ pnpm searchsocket test --file custom-tests.json # custom test file
791
+ ```
792
+
793
+ Test file format:
630
794
 
631
- # With MCP server
632
- pnpm searchsocket dev --mcp --mcp-port 3338
795
+ ```json
796
+ [
797
+ {
798
+ "query": "installation guide",
799
+ "expect": {
800
+ "topResult": "/docs/getting-started",
801
+ "inTop5": ["/docs/getting-started", "/docs/quickstart"]
802
+ }
803
+ }
804
+ ]
633
805
  ```
634
806
 
635
- Watches:
636
- - `src/routes/**` (route files)
637
- - `build/` (if static-output mode)
638
- - Build output dir (if build mode)
639
- - Content files (if content-files mode)
640
- - `searchsocket.config.ts` (if crawl or build mode)
807
+ Reports pass/fail per assertion and Mean Reciprocal Rank (MRR) across all queries.
641
808
 
642
809
  ### `searchsocket clean`
643
810
 
644
- Delete local state and optionally remote vectors.
811
+ Delete local state and optionally remote indexes.
645
812
 
646
813
  ```bash
647
- # Local state only
648
- pnpm searchsocket clean
649
-
650
- # Local + remote vectors
651
- pnpm searchsocket clean --remote --scope staging
814
+ pnpm searchsocket clean # local state only
815
+ pnpm searchsocket clean --remote # also delete remote scope
816
+ pnpm searchsocket clean --scope staging # specific scope
652
817
  ```
653
818
 
654
819
  ### `searchsocket prune`
655
820
 
656
- Delete stale scopes (e.g., deleted git branches).
821
+ List and delete stale scopes. Compares against git branches to find orphaned scopes.
657
822
 
658
823
  ```bash
659
- # Dry run (shows what would be deleted)
660
- pnpm searchsocket prune --older-than 30d
824
+ pnpm searchsocket prune # dry-run (default)
825
+ pnpm searchsocket prune --apply # actually delete
826
+ pnpm searchsocket prune --older-than 30d # only scopes older than 30 days
827
+ ```
661
828
 
662
- # Apply deletions
663
- pnpm searchsocket prune --older-than 30d --apply
829
+ ### `searchsocket mcp`
830
+
831
+ Run the MCP server standalone.
664
832
 
665
- # Use custom scope list
666
- pnpm searchsocket prune --scopes-file active-branches.txt --apply
833
+ ```bash
834
+ pnpm searchsocket mcp # stdio (default)
835
+ pnpm searchsocket mcp --transport http --port 3338 # HTTP
836
+ pnpm searchsocket mcp --access public --api-key SECRET # public with auth
667
837
  ```
668
838
 
669
- ### `searchsocket doctor`
839
+ ### `searchsocket add`
670
840
 
671
- Validate config, env vars, and connectivity.
841
+ Copy Svelte 5 search UI template components into your project.
672
842
 
673
843
  ```bash
674
- pnpm searchsocket doctor
675
-
676
- # Output:
677
- # PASS config parse
678
- # PASS env JINA_API_KEY
679
- # PASS turso/libsql (local file: .searchsocket/vectors.db)
680
- # PASS source: build manifest
681
- # PASS source: vite binary
682
- # PASS embedding provider connectivity
683
- # PASS vector backend connectivity
684
- # PASS vector backend write permission
685
- # PASS state directory writable
844
+ pnpm searchsocket add search-dialog
845
+ pnpm searchsocket add search-input
846
+ pnpm searchsocket add search-results
847
+ pnpm searchsocket add search-dialog --dir src/lib/components/ui # custom dir
686
848
  ```
687
849
 
688
- ### `searchsocket mcp`
850
+ ## Real-World Example
689
851
 
690
- Run MCP server for Claude Desktop / other MCP clients.
852
+ Here's how [Canopy](https://canopy.dev) integrates SearchSocket into a production SvelteKit site.
691
853
 
692
- ```bash
693
- # stdio transport (default)
694
- pnpm searchsocket mcp
854
+ ### Configuration
695
855
 
696
- # HTTP transport
697
- pnpm searchsocket mcp --transport http --port 3338
856
+ ```ts
857
+ // searchsocket.config.ts
858
+ export default {
859
+ project: {
860
+ id: "canopy-website",
861
+ baseUrl: "https://canopy.dev"
862
+ },
863
+ source: {
864
+ mode: "build"
865
+ },
866
+ extract: {
867
+ dropSelectors: [".nav-blur", ".mobile-overlay", ".docs-sidebar"]
868
+ },
869
+ ranking: {
870
+ minScoreRatio: 0.70,
871
+ pageWeights: {
872
+ "/": 0.95,
873
+ "/download": 1.05,
874
+ "/docs/**": 1.05
875
+ },
876
+ aggregationCap: 3,
877
+ aggregationDecay: 0.3
878
+ }
879
+ };
698
880
  ```
699
881
 
700
- ### `searchsocket search`
882
+ ### Server hook
701
883
 
702
- CLI search for testing.
884
+ ```ts
885
+ // src/hooks.server.ts
886
+ import { searchsocketHandle } from "searchsocket/sveltekit";
887
+ import { env } from "$env/dynamic/private";
703
888
 
704
- ```bash
705
- pnpm searchsocket search --q "turso vector search" --top-k 5 --rerank
889
+ export const handle = searchsocketHandle({
890
+ rawConfig: {
891
+ project: { id: "canopy-website", baseUrl: "https://canopy.dev" },
892
+ source: { mode: "build" },
893
+ upstash: {
894
+ url: env.UPSTASH_VECTOR_REST_URL,
895
+ token: env.UPSTASH_VECTOR_REST_TOKEN
896
+ },
897
+ extract: {
898
+ dropSelectors: [".nav-blur", ".mobile-overlay", ".docs-sidebar"]
899
+ },
900
+ ranking: {
901
+ minScoreRatio: 0.70,
902
+ pageWeights: { "/": 0.95, "/download": 1.05, "/docs/**": 1.05 },
903
+ aggregationCap: 3,
904
+ aggregationDecay: 0.3
905
+ }
906
+ }
907
+ });
908
+ ```
909
+
910
+ ### Search modal with scoped search
911
+
912
+ ```svelte
913
+ <!-- SearchModal.svelte -->
914
+ <script>
915
+ import { createSearchClient, buildResultUrl } from "searchsocket/client";
916
+
917
+ let { open = $bindable(false), pathPrefix = "", placeholder = "Search..." } = $props();
918
+
919
+ const client = createSearchClient();
920
+ let query = $state("");
921
+ let results = $state([]);
922
+
923
+ async function doSearch() {
924
+ if (!query.trim()) { results = []; return; }
925
+ const res = await client.search({
926
+ q: query,
927
+ topK: 8,
928
+ groupBy: "page",
929
+ pathPrefix: pathPrefix || undefined
930
+ });
931
+ results = res.results;
932
+ }
933
+ </script>
934
+
935
+ {#if open}
936
+ <dialog open>
937
+ <input bind:value={query} oninput={doSearch} {placeholder} />
938
+ {#each results as result}
939
+ <a href={buildResultUrl(result)} onclick={() => open = false}>
940
+ <strong>{result.title}</strong>
941
+ {#if result.sectionTitle}<span>— {result.sectionTitle}</span>{/if}
942
+ <p>{result.snippet}</p>
943
+ </a>
944
+ {/each}
945
+ </dialog>
946
+ {/if}
706
947
  ```
707
948
 
708
- ## MCP (Model Context Protocol)
949
+ ### Scroll-to-text in layout
709
950
 
710
- SearchSocket provides an **MCP server** for integration with Claude Code, Claude Desktop, and other MCP-compatible AI tools. This gives AI assistants direct access to your indexed site content for semantic search and page retrieval.
951
+ ```svelte
952
+ <!-- src/routes/+layout.svelte -->
953
+ <script>
954
+ import { afterNavigate } from "$app/navigation";
955
+ import { searchsocketScrollToText } from "searchsocket/sveltekit";
711
956
 
712
- ### Tools
957
+ afterNavigate(searchsocketScrollToText);
958
+ </script>
959
+ ```
713
960
 
714
- **`search(query, opts?)`**
715
- - Semantic search across indexed content
716
- - Returns ranked results with URL, title, snippet, score, and routeFile
717
- - Options: `scope`, `topK` (1-100), `pathPrefix`, `tags`, `groupBy` (`"page"` | `"chunk"`)
961
+ ### Deploy and index
718
962
 
719
- **`get_page(pathOrUrl, opts?)`**
720
- - Retrieve full indexed page content as markdown with frontmatter
721
- - Options: `scope`
963
+ Indexing runs automatically on every Vercel deploy. Set these env vars in the Vercel dashboard:
722
964
 
723
- ### Setup (Claude Code)
965
+ - `UPSTASH_VECTOR_REST_URL`
966
+ - `UPSTASH_VECTOR_REST_TOKEN`
967
+ - `SEARCHSOCKET_AUTO_INDEX=1`
724
968
 
725
- Add a `.mcp.json` file to your project root (safe to commit — no secrets needed since the CLI auto-loads `.env`):
969
+ The Vite plugin handles the rest. Alternatively, use a postbuild script:
726
970
 
727
971
  ```json
728
972
  {
729
- "mcpServers": {
730
- "searchsocket": {
731
- "type": "stdio",
732
- "command": "npx",
733
- "args": ["searchsocket", "mcp"],
734
- "env": {}
735
- }
973
+ "scripts": {
974
+ "build": "vite build",
975
+ "postbuild": "searchsocket index"
736
976
  }
737
977
  }
738
978
  ```
739
979
 
740
- Restart Claude Code. The `search` and `get_page` tools will be available automatically. Verify with:
741
-
742
- ```bash
743
- claude mcp list
744
- ```
745
-
746
- ### Setup (Claude Desktop)
747
-
748
- Add to `~/Library/Application Support/Claude/claude_desktop_config.json`:
980
+ ### Connect Claude Code to the deployed site
749
981
 
750
982
  ```json
751
983
  {
752
984
  "mcpServers": {
753
985
  "searchsocket": {
754
- "command": "npx",
755
- "args": ["searchsocket", "mcp"],
756
- "cwd": "/path/to/your/project"
986
+ "type": "http",
987
+ "url": "https://canopy.dev/api/mcp"
757
988
  }
758
989
  }
759
990
  }
760
991
  ```
761
992
 
762
- Restart Claude Desktop. The tools appear in the MCP menu.
993
+ Now Claude Code can search the live docs, retrieve page content, and find source files — all backed by the production index that stays current with every deploy.
763
994
 
764
- ### HTTP Transport
995
+ ### Excluding pages from search
765
996
 
766
- For non-stdio clients, run the MCP server over HTTP:
767
-
768
- ```bash
769
- npx searchsocket mcp --transport http --port 3338
997
+ ```svelte
998
+ <!-- src/routes/blog/+page.svelte (archive page) -->
999
+ <svelte:head>
1000
+ <meta name="searchsocket-weight" content="0" />
1001
+ </svelte:head>
770
1002
  ```
771
1003
 
772
- This starts a stateless server at `http://127.0.0.1:3338/mcp`. Each POST request creates a fresh server instance with no session persistence.
1004
+ Or with the component:
773
1005
 
774
- ## Environment Variables
1006
+ ```svelte
1007
+ <script>
1008
+ import { SearchSocket } from "searchsocket/svelte";
1009
+ </script>
775
1010
 
776
- The CLI automatically loads `.env` from the working directory on startup. Existing `process.env` values take precedence over `.env` file values. This only applies to CLI commands (`searchsocket index`, `searchsocket mcp`, etc.) — library imports like `searchsocketHandle()` rely on your framework's own `.env` handling (Vite/SvelteKit).
1011
+ <SearchSocket weight={0} />
1012
+ ```
777
1013
 
778
- ### Required
1014
+ ### Vite SSR config
1015
+
1016
+ ```ts
1017
+ // vite.config.ts
1018
+ import { sveltekit } from "@sveltejs/kit/vite";
1019
+ import { defineConfig } from "vite";
1020
+
1021
+ export default defineConfig({
1022
+ plugins: [sveltekit()],
1023
+ ssr: {
1024
+ external: ["searchsocket", "searchsocket/sveltekit", "searchsocket/client"]
1025
+ }
1026
+ });
1027
+ ```
779
1028
 
780
- **Jina AI:**
781
- - `JINA_API_KEY` — Jina AI API key for embeddings and reranking
1029
+ ## Environment Variables
782
1030
 
783
- ### Optional (Turso)
1031
+ ### Required
784
1032
 
785
- **Remote Turso (production):**
786
- - `TURSO_DATABASE_URL` — Turso database URL (e.g., `libsql://my-db.turso.io`)
787
- - `TURSO_AUTH_TOKEN` Turso auth token
1033
+ | Variable | Description |
1034
+ |----------|-------------|
1035
+ | `UPSTASH_VECTOR_REST_URL` | Upstash Vector REST API endpoint |
1036
+ | `UPSTASH_VECTOR_REST_TOKEN` | Upstash Vector REST API token |
788
1037
 
789
- If not set, uses local file DB at `.searchsocket/vectors.db`.
1038
+ ### Optional
790
1039
 
791
- ### Optional (Scope/Build)
1040
+ | Variable | Description |
1041
+ |----------|-------------|
1042
+ | `SEARCHSOCKET_SCOPE` | Override scope (when `scope.mode: "env"`) |
1043
+ | `SEARCHSOCKET_AUTO_INDEX` | Enable build-triggered indexing (`1`, `true`, or `yes`) |
1044
+ | `SEARCHSOCKET_DISABLE_AUTO_INDEX` | Disable build-triggered indexing |
1045
+ | `SEARCHSOCKET_FORCE_REINDEX` | Force full re-index in CI/CD |
792
1046
 
793
- - `SEARCHSOCKET_SCOPE` Override scope (when `scope.mode: "env"`)
794
- - `SEARCHSOCKET_AUTO_INDEX` — Enable build-triggered indexing
795
- - `SEARCHSOCKET_DISABLE_AUTO_INDEX` — Disable build-triggered indexing
1047
+ The CLI automatically loads `.env` from the working directory on startup.
796
1048
 
797
- ## Configuration
1049
+ ## Configuration Reference
798
1050
 
799
- ### Full Example
1051
+ See [docs/config.md](docs/config.md) for the full configuration reference. Here's the full example:
800
1052
 
801
1053
  ```ts
802
1054
  export default {
@@ -806,41 +1058,24 @@ export default {
806
1058
  },
807
1059
 
808
1060
  scope: {
809
- mode: "git", // "fixed" | "git" | "env"
1061
+ mode: "git", // "fixed" | "git" | "env"
810
1062
  fixed: "main",
811
1063
  sanitize: true
812
1064
  },
813
1065
 
1066
+ exclude: ["/admin/*", "/api/*"],
1067
+ respectRobotsTxt: true,
1068
+
814
1069
  source: {
815
- mode: "build", // "static-output" | "crawl" | "content-files" | "build"
1070
+ mode: "build",
816
1071
  staticOutputDir: "build",
817
- strictRouteMapping: false,
818
-
819
- // Build mode (recommended for CI/CD)
820
1072
  build: {
821
- outputDir: ".svelte-kit/output",
822
- previewTimeout: 30000,
823
1073
  exclude: ["/api/*"],
824
1074
  paramValues: {
825
1075
  "/blog/[slug]": ["hello-world", "getting-started"]
826
1076
  },
827
- discover: false,
828
- seedUrls: ["/"],
829
- maxPages: 200,
830
- maxDepth: 5
831
- },
832
-
833
- // Crawl mode (alternative)
834
- crawl: {
835
- baseUrl: "http://localhost:4173",
836
- routes: ["/", "/docs", "/blog"],
837
- sitemapUrl: "https://example.com/sitemap.xml"
838
- },
839
-
840
- // Content files mode (alternative)
841
- contentFiles: {
842
- globs: ["src/routes/**/*.md"],
843
- baseDir: "."
1077
+ discover: true,
1078
+ maxPages: 200
844
1079
  }
845
1080
  },
846
1081
 
@@ -850,77 +1085,77 @@ export default {
850
1085
  dropSelectors: [".sidebar", ".toc"],
851
1086
  ignoreAttr: "data-search-ignore",
852
1087
  noindexAttr: "data-search-noindex",
853
- respectRobotsNoindex: true
1088
+ imageDescAttr: "data-search-description"
854
1089
  },
855
1090
 
856
1091
  chunking: {
857
- maxChars: 2200,
1092
+ maxChars: 1500,
858
1093
  overlapChars: 200,
859
1094
  minChars: 250,
860
- headingPathDepth: 3,
861
- dontSplitInside: ["code", "table", "blockquote"],
862
- prependTitle: true, // prepend page title to chunk text before embedding
863
- pageSummaryChunk: true // generate synthetic identity chunk per page
864
- },
865
-
866
- embeddings: {
867
- provider: "jina",
868
- model: "jina-embeddings-v5-text-small",
869
- apiKey: "jina_...", // direct API key (or use apiKeyEnv)
870
- apiKeyEnv: "JINA_API_KEY",
871
- batchSize: 64,
872
- concurrency: 4
1095
+ prependTitle: true,
1096
+ pageSummaryChunk: true
873
1097
  },
874
1098
 
875
- vector: {
876
- dimension: 1024, // optional, inferred from first embedding
877
- turso: {
878
- url: "libsql://my-db.turso.io", // direct URL (or use urlEnv)
879
- authToken: "eyJhbGc...", // direct token (or use authTokenEnv)
880
- urlEnv: "TURSO_DATABASE_URL",
881
- authTokenEnv: "TURSO_AUTH_TOKEN",
882
- localPath: ".searchsocket/vectors.db"
883
- }
1099
+ upstash: {
1100
+ urlEnv: "UPSTASH_VECTOR_REST_URL",
1101
+ tokenEnv: "UPSTASH_VECTOR_REST_TOKEN"
884
1102
  },
885
1103
 
886
- rerank: {
887
- enabled: true,
888
- topN: 20,
889
- model: "jina-reranker-v3"
1104
+ search: {
1105
+ dualSearch: true,
1106
+ pageSearchWeight: 0.3
890
1107
  },
891
1108
 
892
1109
  ranking: {
893
1110
  enableIncomingLinkBoost: true,
894
1111
  enableDepthBoost: true,
895
- pageWeights: {
896
- "/": 1.1,
897
- "/docs": 1.15
898
- },
899
- minScore: 0,
1112
+ pageWeights: { "/docs": 1.15 },
1113
+ minScoreRatio: 0.70,
900
1114
  aggregationCap: 5,
901
- aggregationDecay: 0.5,
902
- minChunkScoreRatio: 0.5,
903
- weights: {
904
- incomingLinks: 0.05,
905
- depth: 0.03,
906
- rerank: 1.0,
907
- aggregation: 0.1
908
- }
1115
+ aggregationDecay: 0.5
909
1116
  },
910
1117
 
911
1118
  api: {
912
1119
  path: "/api/search",
913
- cors: {
914
- allowOrigins: ["https://example.com"]
915
- },
916
- rateLimit: {
917
- windowMs: 60_000,
918
- max: 60
919
- }
1120
+ cors: { allowOrigins: ["https://example.com"] }
1121
+ },
1122
+
1123
+ mcp: {
1124
+ enable: true,
1125
+ handle: { path: "/api/mcp" }
1126
+ },
1127
+
1128
+ llmsTxt: {
1129
+ enable: true,
1130
+ title: "My Project",
1131
+ description: "Documentation for My Project"
1132
+ },
1133
+
1134
+ state: {
1135
+ dir: ".searchsocket"
920
1136
  }
921
1137
  };
922
1138
  ```
923
1139
 
1140
+ ## CI/CD
1141
+
1142
+ See [docs/ci.md](docs/ci.md) for ready-to-use GitHub Actions workflows covering:
1143
+
1144
+ - Main branch indexing on push
1145
+ - PR dry-run validation
1146
+ - Preview branch scope isolation
1147
+ - Scheduled scope pruning
1148
+ - Vercel build-triggered indexing
1149
+
1150
+ ## Further Reading
1151
+
1152
+ - [Building a Search UI](docs/search-ui.md) — Cmd+K modals, scoped search, styling, and API reference
1153
+ - [Tuning Search Relevance](docs/tuning.md) — visual playground, ranking parameters, and search quality testing
1154
+ - [Configuration Reference](docs/config.md) — all config options, indexing hooks, and custom records
1155
+ - [CI/CD Workflows](docs/ci.md) — GitHub Actions and Vercel integration
1156
+ - [MCP over HTTP Guide](docs/mcp-claude-code.md) — detailed HTTP MCP setup for Claude Code
1157
+ - [Troubleshooting](docs/troubleshooting.md) — common issues, diagnostics, and FAQ
1158
+
924
1159
  ## License
925
1160
 
926
1161
  MIT