searchsocket 0.4.0 → 0.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +742 -507
- package/dist/cli.js +3504 -1412
- package/dist/client.cjs +41 -117
- package/dist/client.d.cts +3 -17
- package/dist/client.d.ts +3 -17
- package/dist/client.js +41 -117
- package/dist/index.cjs +2553 -1499
- package/dist/index.d.cts +133 -34
- package/dist/index.d.ts +133 -34
- package/dist/index.js +2551 -1494
- package/dist/plugin-C61L-ykY.d.ts +37 -0
- package/dist/plugin-DoBW1gkK.d.cts +37 -0
- package/dist/scroll.cjs +185 -0
- package/dist/scroll.d.cts +42 -0
- package/dist/scroll.d.ts +42 -0
- package/dist/scroll.js +183 -0
- package/dist/sveltekit.cjs +2769 -1389
- package/dist/sveltekit.d.cts +3 -43
- package/dist/sveltekit.d.ts +3 -43
- package/dist/sveltekit.js +2769 -1389
- package/dist/templates/search-dialog/SearchDialog.svelte +175 -0
- package/dist/templates/search-input/SearchInput.svelte +151 -0
- package/dist/templates/search-results/SearchResults.svelte +75 -0
- package/dist/{types-z2dw3H6E.d.cts → types-029hl6P2.d.cts} +210 -134
- package/dist/{types-z2dw3H6E.d.ts → types-029hl6P2.d.ts} +210 -134
- package/package.json +28 -3
- package/src/svelte/SearchSocket.svelte +35 -0
- package/src/svelte/index.svelte.ts +181 -0
package/README.md
CHANGED
|
@@ -1,34 +1,44 @@
|
|
|
1
1
|
# SearchSocket
|
|
2
2
|
|
|
3
|
-
Semantic site search and MCP retrieval for SvelteKit content projects.
|
|
3
|
+
Semantic site search and MCP retrieval for SvelteKit content projects. Index your site, search it from the browser or AI tools, and scroll users to the exact content they're looking for.
|
|
4
4
|
|
|
5
|
-
**Requirements**: Node.js >= 20
|
|
5
|
+
**Requirements**: Node.js >= 20 | **Backend**: [Upstash Vector](https://upstash.com/docs/vector/overall/getstarted) | **License**: MIT
|
|
6
|
+
|
|
7
|
+
## How it works
|
|
8
|
+
|
|
9
|
+
```
|
|
10
|
+
SvelteKit Pages → Extractor (Cheerio + Turndown) → Chunker → Upstash Vector
|
|
11
|
+
↓
|
|
12
|
+
Search UI ← SvelteKit API Hook ← Search Engine + Ranking
|
|
13
|
+
↓
|
|
14
|
+
MCP Endpoint → Claude Code / Claude Desktop
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
SearchSocket extracts content from your SvelteKit site, converts it to markdown, splits it into chunks, and stores them in Upstash Vector. At runtime, the SvelteKit hook serves both a search API for your frontend and an MCP endpoint for AI tools.
|
|
6
18
|
|
|
7
19
|
## Features
|
|
8
20
|
|
|
9
|
-
- **
|
|
10
|
-
- **
|
|
11
|
-
- **
|
|
12
|
-
- **
|
|
13
|
-
- **
|
|
14
|
-
- **
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
- **
|
|
18
|
-
- **MCP Server**: Model Context Protocol tools for search and page retrieval
|
|
19
|
-
- **Git-Tracked Markdown Mirror**: Commit-safe deterministic markdown outputs
|
|
21
|
+
- **Semantic + keyword search** — Upstash Vector handles hybrid search with built-in reranking and input enrichment
|
|
22
|
+
- **Dual search** — parallel page-level and chunk-level queries with configurable score blending
|
|
23
|
+
- **Scroll-to-text** — auto-scroll to the matching section when a user clicks a search result, with CSS Highlight API and Text Fragment support
|
|
24
|
+
- **SvelteKit integration** — server hook for the search API, Vite plugin for build-triggered indexing
|
|
25
|
+
- **Svelte 5 components** — reactive `createSearch` store and `<SearchSocket>` metadata component
|
|
26
|
+
- **MCP server** — six tools for Claude Code, Claude Desktop, and other MCP clients (stdio + HTTP)
|
|
27
|
+
- **llms.txt generation** — auto-generate LLM-friendly site indexes during indexing
|
|
28
|
+
- **Four source modes** — index from static output, build manifest, a running server, or raw markdown files
|
|
29
|
+
- **CLI** — init, index, search, dev, status, doctor, clean, prune, test, mcp, add
|
|
20
30
|
|
|
21
31
|
## Install
|
|
22
32
|
|
|
23
33
|
```bash
|
|
24
|
-
# pnpm
|
|
25
34
|
pnpm add -D searchsocket
|
|
26
|
-
|
|
27
|
-
# npm
|
|
28
|
-
npm install -D searchsocket
|
|
29
35
|
```
|
|
30
36
|
|
|
31
|
-
SearchSocket is typically a dev dependency
|
|
37
|
+
SearchSocket is typically a dev dependency since indexing runs at build time. If you use `searchsocketHandle()` at runtime (e.g., in a Node server adapter or serving the MCP endpoint from a production deployment), add it as a regular dependency:
|
|
38
|
+
|
|
39
|
+
```bash
|
|
40
|
+
pnpm add searchsocket
|
|
41
|
+
```
|
|
32
42
|
|
|
33
43
|
## Quickstart
|
|
34
44
|
|
|
@@ -38,100 +48,134 @@ SearchSocket is typically a dev dependency for CLI indexing. If you use `searchs
|
|
|
38
48
|
pnpm searchsocket init
|
|
39
49
|
```
|
|
40
50
|
|
|
41
|
-
|
|
42
|
-
- `searchsocket.config.ts` — minimal config file
|
|
43
|
-
- `.searchsocket/` — state directory (added to `.gitignore`)
|
|
51
|
+
Creates `searchsocket.config.ts`, the `.searchsocket/` state directory, wires up your SvelteKit hooks and Vite config, and generates `.mcp.json` for Claude Code.
|
|
44
52
|
|
|
45
53
|
### 2. Configure
|
|
46
54
|
|
|
47
55
|
Minimal config (`searchsocket.config.ts`):
|
|
48
56
|
|
|
49
57
|
```ts
|
|
50
|
-
export default {
|
|
51
|
-
embeddings: { apiKeyEnv: "JINA_API_KEY" }
|
|
52
|
-
};
|
|
58
|
+
export default {};
|
|
53
59
|
```
|
|
54
60
|
|
|
55
|
-
|
|
56
|
-
- **Development**: Uses local file DB at `.searchsocket/vectors.db`
|
|
57
|
-
- **Production**: Set `TURSO_DATABASE_URL` and `TURSO_AUTH_TOKEN` to use remote Turso
|
|
61
|
+
That's it — defaults handle the rest. SearchSocket reads `UPSTASH_VECTOR_REST_URL` and `UPSTASH_VECTOR_REST_TOKEN` from your environment automatically.
|
|
58
62
|
|
|
59
|
-
### 3.
|
|
63
|
+
### 3. Set environment variables
|
|
60
64
|
|
|
61
|
-
|
|
65
|
+
```bash
|
|
66
|
+
# .env
|
|
67
|
+
UPSTASH_VECTOR_REST_URL=https://...
|
|
68
|
+
UPSTASH_VECTOR_REST_TOKEN=...
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
Create an [Upstash Vector index](https://console.upstash.com/vector) with the `bge-large-en-v1.5` embedding model (1024 dimensions). Copy the REST URL and token.
|
|
72
|
+
|
|
73
|
+
### 4. Add the SvelteKit hook
|
|
74
|
+
|
|
75
|
+
The `init` command does this for you, but if you need to do it manually:
|
|
62
76
|
|
|
63
77
|
```ts
|
|
78
|
+
// src/hooks.server.ts
|
|
64
79
|
import { searchsocketHandle } from "searchsocket/sveltekit";
|
|
65
80
|
|
|
66
81
|
export const handle = searchsocketHandle();
|
|
67
82
|
```
|
|
68
83
|
|
|
69
|
-
This exposes `POST /api/search`
|
|
84
|
+
This exposes `POST /api/search`, `GET /api/search/health`, the MCP endpoint at `/api/mcp`, and page retrieval routes.
|
|
85
|
+
|
|
86
|
+
If you run into SSR bundling issues, mark SearchSocket as external in your Vite config:
|
|
87
|
+
|
|
88
|
+
```ts
|
|
89
|
+
// vite.config.ts
|
|
90
|
+
export default defineConfig({
|
|
91
|
+
plugins: [sveltekit()],
|
|
92
|
+
ssr: {
|
|
93
|
+
external: ["searchsocket", "searchsocket/sveltekit", "searchsocket/client"]
|
|
94
|
+
}
|
|
95
|
+
});
|
|
96
|
+
```
|
|
70
97
|
|
|
71
|
-
###
|
|
98
|
+
### 5. Add search to your frontend
|
|
72
99
|
|
|
73
|
-
|
|
100
|
+
Copy the search dialog template into your project:
|
|
74
101
|
|
|
75
|
-
Development (`.env`):
|
|
76
102
|
```bash
|
|
77
|
-
|
|
103
|
+
pnpm searchsocket add search-dialog
|
|
78
104
|
```
|
|
79
105
|
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
106
|
+
This copies a Svelte 5 component to `src/lib/components/search/SearchDialog.svelte` with Cmd+K built in. Import it in your layout and add the scroll-to-text handler:
|
|
107
|
+
|
|
108
|
+
```svelte
|
|
109
|
+
<!-- src/routes/+layout.svelte -->
|
|
110
|
+
<script>
|
|
111
|
+
import { afterNavigate } from "$app/navigation";
|
|
112
|
+
import { searchsocketScrollToText } from "searchsocket/sveltekit";
|
|
113
|
+
import SearchDialog from "$lib/components/search/SearchDialog.svelte";
|
|
114
|
+
|
|
115
|
+
afterNavigate(searchsocketScrollToText);
|
|
116
|
+
</script>
|
|
117
|
+
|
|
118
|
+
<SearchDialog />
|
|
119
|
+
|
|
120
|
+
<slot />
|
|
85
121
|
```
|
|
86
122
|
|
|
87
|
-
|
|
123
|
+
Users can now press Cmd+K to search. See [Building a Search UI](docs/search-ui.md) for scoped search, custom styling, and more patterns.
|
|
124
|
+
|
|
125
|
+
### 6. Deploy
|
|
126
|
+
|
|
127
|
+
SearchSocket is designed to index automatically on deploy. The `init` command already added the Vite plugin to your config. Set these environment variables on your hosting platform (Vercel, Cloudflare, etc.):
|
|
128
|
+
|
|
129
|
+
| Variable | Value |
|
|
130
|
+
|----------|-------|
|
|
131
|
+
| `UPSTASH_VECTOR_REST_URL` | Your Upstash Vector REST URL |
|
|
132
|
+
| `UPSTASH_VECTOR_REST_TOKEN` | Your Upstash Vector REST token |
|
|
133
|
+
| `SEARCHSOCKET_AUTO_INDEX` | `1` |
|
|
134
|
+
|
|
135
|
+
Every deploy will build your site, index the content, and serve the search API — fully automated.
|
|
136
|
+
|
|
137
|
+
For local testing, you can also build and index manually:
|
|
88
138
|
|
|
89
139
|
```bash
|
|
90
|
-
pnpm
|
|
140
|
+
pnpm build
|
|
141
|
+
pnpm searchsocket index
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
### 7. Connect Claude Code (optional)
|
|
145
|
+
|
|
146
|
+
Point Claude Code at your deployed site's MCP endpoint:
|
|
147
|
+
|
|
148
|
+
```json
|
|
149
|
+
{
|
|
150
|
+
"mcpServers": {
|
|
151
|
+
"searchsocket": {
|
|
152
|
+
"type": "http",
|
|
153
|
+
"url": "https://your-site.com/api/mcp"
|
|
154
|
+
}
|
|
155
|
+
}
|
|
156
|
+
}
|
|
91
157
|
```
|
|
92
158
|
|
|
93
|
-
|
|
94
|
-
- **`static-output`** (default): Reads prerendered HTML from `build/`
|
|
95
|
-
- **`build`**: Discovers routes from SvelteKit build manifest and renders via preview server
|
|
96
|
-
- **`crawl`**: Fetches pages from a running HTTP server
|
|
97
|
-
- **`content-files`**: Reads markdown/svelte source files directly
|
|
159
|
+
See [MCP Server](#mcp-server) for authentication and other options.
|
|
98
160
|
|
|
99
|
-
|
|
100
|
-
- Extracts content from `<main>` (configurable), including `<meta>` description and keywords
|
|
101
|
-
- Chunks text with semantic heading boundaries
|
|
102
|
-
- Prepends page title to each chunk for embedding context
|
|
103
|
-
- Generates a synthetic summary chunk per page for identity matching
|
|
104
|
-
- Generates embeddings via Jina AI (with task-specific LoRA adapters for indexing vs search)
|
|
105
|
-
- Stores vectors in Turso/libSQL with cosine similarity index
|
|
161
|
+
### Querying the API directly
|
|
106
162
|
|
|
107
|
-
|
|
163
|
+
The search API is also available via HTTP and CLI:
|
|
108
164
|
|
|
109
|
-
**Via API:**
|
|
110
165
|
```bash
|
|
166
|
+
# cURL
|
|
111
167
|
curl -X POST http://localhost:5173/api/search \
|
|
112
168
|
-H "content-type: application/json" \
|
|
113
169
|
-d '{"q":"getting started","topK":5,"groupBy":"page"}'
|
|
114
|
-
```
|
|
115
|
-
|
|
116
|
-
**Via client library:**
|
|
117
|
-
```ts
|
|
118
|
-
import { createSearchClient } from "searchsocket/client";
|
|
119
170
|
|
|
120
|
-
|
|
121
|
-
|
|
122
|
-
q: "getting started",
|
|
123
|
-
topK: 5,
|
|
124
|
-
groupBy: "page",
|
|
125
|
-
pathPrefix: "/docs"
|
|
126
|
-
});
|
|
171
|
+
# CLI
|
|
172
|
+
pnpm searchsocket search --q "getting started" --top-k 5
|
|
127
173
|
```
|
|
128
174
|
|
|
129
|
-
|
|
130
|
-
|
|
131
|
-
|
|
132
|
-
```
|
|
175
|
+
### Response format
|
|
176
|
+
|
|
177
|
+
With `groupBy: "page"` (the default):
|
|
133
178
|
|
|
134
|
-
**Response** (with `groupBy: "page"`, the default):
|
|
135
179
|
```json
|
|
136
180
|
{
|
|
137
181
|
"q": "getting started",
|
|
@@ -161,18 +205,16 @@ pnpm searchsocket search --q "getting started" --top-k 5 --path-prefix /docs
|
|
|
161
205
|
}
|
|
162
206
|
],
|
|
163
207
|
"meta": {
|
|
164
|
-
"timingsMs": { "
|
|
165
|
-
"usedRerank": false,
|
|
166
|
-
"modelId": "jina-embeddings-v5-text-small"
|
|
208
|
+
"timingsMs": { "total": 135 }
|
|
167
209
|
}
|
|
168
210
|
}
|
|
169
211
|
```
|
|
170
212
|
|
|
171
|
-
The `chunks` array
|
|
213
|
+
The `chunks` array contains matching sections within each page. Use `groupBy: "chunk"` for flat per-chunk results without page aggregation.
|
|
172
214
|
|
|
173
215
|
## Source Modes
|
|
174
216
|
|
|
175
|
-
SearchSocket supports four
|
|
217
|
+
SearchSocket supports four ways to load your site content for indexing.
|
|
176
218
|
|
|
177
219
|
### `static-output` (default)
|
|
178
220
|
|
|
@@ -182,50 +224,37 @@ Reads prerendered HTML files from SvelteKit's build output directory.
|
|
|
182
224
|
export default {
|
|
183
225
|
source: {
|
|
184
226
|
mode: "static-output",
|
|
185
|
-
staticOutputDir: "build"
|
|
227
|
+
staticOutputDir: "build" // default
|
|
186
228
|
}
|
|
187
229
|
};
|
|
188
230
|
```
|
|
189
231
|
|
|
190
|
-
Best for
|
|
232
|
+
Best for fully prerendered sites. Run `vite build` first, then `searchsocket index`.
|
|
191
233
|
|
|
192
234
|
### `build`
|
|
193
235
|
|
|
194
|
-
Discovers routes
|
|
236
|
+
Discovers routes from SvelteKit's build manifest and renders via an ephemeral `vite preview` server. No manual route lists needed.
|
|
195
237
|
|
|
196
238
|
```ts
|
|
197
239
|
export default {
|
|
198
240
|
source: {
|
|
241
|
+
mode: "build",
|
|
199
242
|
build: {
|
|
200
|
-
|
|
201
|
-
|
|
202
|
-
exclude: ["/api/*", "/admin/*"], // glob patterns to skip
|
|
203
|
-
paramValues: { // values for dynamic routes
|
|
243
|
+
exclude: ["/api/*", "/admin/*"],
|
|
244
|
+
paramValues: {
|
|
204
245
|
"/blog/[slug]": ["hello-world", "getting-started"],
|
|
205
246
|
"/docs/[category]/[page]": ["guides/quickstart", "api/search"]
|
|
206
247
|
},
|
|
207
|
-
discover: true,
|
|
208
|
-
seedUrls: ["/"],
|
|
209
|
-
maxPages: 200,
|
|
210
|
-
maxDepth: 5
|
|
248
|
+
discover: true, // crawl internal links to find more pages
|
|
249
|
+
seedUrls: ["/"],
|
|
250
|
+
maxPages: 200,
|
|
251
|
+
maxDepth: 5
|
|
211
252
|
}
|
|
212
253
|
}
|
|
213
254
|
};
|
|
214
255
|
```
|
|
215
256
|
|
|
216
|
-
Best for
|
|
217
|
-
|
|
218
|
-
**How it works**:
|
|
219
|
-
1. Parses `.svelte-kit/output/server/manifest-full.js` to discover all page routes
|
|
220
|
-
2. Expands dynamic routes using `paramValues` (skips dynamic routes without values)
|
|
221
|
-
3. Starts an ephemeral `vite preview` server on a random port
|
|
222
|
-
4. Fetches all routes concurrently for SSR-rendered HTML
|
|
223
|
-
5. Provides exact route-to-file mapping (no heuristic matching needed)
|
|
224
|
-
6. Shuts down the preview server
|
|
225
|
-
|
|
226
|
-
**Dynamic routes**: Each key in `paramValues` maps to a route ID (e.g., `/blog/[slug]`) or its URL equivalent. Each value in the array replaces all `[param]` segments in the URL. Routes with layout groups like `/(app)/blog/[slug]` also match the URL key `/blog/[slug]`.
|
|
227
|
-
|
|
228
|
-
**Link discovery**: Enable `discover: true` to automatically find pages by crawling internal links from `seedUrls`. This is useful when dynamic routes have many parameter values that are impractical to enumerate. The crawler respects `maxPages` and `maxDepth` limits and only follows links within the same origin.
|
|
257
|
+
Best for CI/CD pipelines: `vite build && searchsocket index` with zero route configuration.
|
|
229
258
|
|
|
230
259
|
### `crawl`
|
|
231
260
|
|
|
@@ -234,24 +263,24 @@ Fetches pages from a running HTTP server.
|
|
|
234
263
|
```ts
|
|
235
264
|
export default {
|
|
236
265
|
source: {
|
|
266
|
+
mode: "crawl",
|
|
237
267
|
crawl: {
|
|
238
268
|
baseUrl: "http://localhost:4173",
|
|
239
|
-
routes: ["/", "/docs", "/blog"],
|
|
240
|
-
sitemapUrl: "https://example.com/sitemap.xml"
|
|
269
|
+
routes: ["/", "/docs", "/blog"],
|
|
270
|
+
sitemapUrl: "https://example.com/sitemap.xml"
|
|
241
271
|
}
|
|
242
272
|
}
|
|
243
273
|
};
|
|
244
274
|
```
|
|
245
275
|
|
|
246
|
-
If `routes` is omitted and no `sitemapUrl` is set, defaults to crawling `["/"]` only.
|
|
247
|
-
|
|
248
276
|
### `content-files`
|
|
249
277
|
|
|
250
|
-
Reads markdown and
|
|
278
|
+
Reads markdown and Svelte source files directly, without building or serving.
|
|
251
279
|
|
|
252
280
|
```ts
|
|
253
281
|
export default {
|
|
254
282
|
source: {
|
|
283
|
+
mode: "content-files",
|
|
255
284
|
contentFiles: {
|
|
256
285
|
globs: ["src/routes/**/*.md", "content/**/*.md"],
|
|
257
286
|
baseDir: "."
|
|
@@ -262,541 +291,764 @@ export default {
|
|
|
262
291
|
|
|
263
292
|
## Client Library
|
|
264
293
|
|
|
265
|
-
|
|
294
|
+
### `createSearchClient(options?)`
|
|
295
|
+
|
|
296
|
+
Lightweight browser-side search client.
|
|
266
297
|
|
|
267
298
|
```ts
|
|
268
299
|
import { createSearchClient } from "searchsocket/client";
|
|
269
300
|
|
|
270
301
|
const client = createSearchClient({
|
|
271
|
-
endpoint: "/api/search",
|
|
272
|
-
fetchImpl: fetch
|
|
302
|
+
endpoint: "/api/search", // default
|
|
303
|
+
fetchImpl: fetch // override for SSR or testing
|
|
273
304
|
});
|
|
274
305
|
|
|
275
|
-
const
|
|
306
|
+
const { results } = await client.search({
|
|
276
307
|
q: "deployment guide",
|
|
277
308
|
topK: 8,
|
|
278
309
|
groupBy: "page",
|
|
279
310
|
pathPrefix: "/docs",
|
|
280
311
|
tags: ["guide"],
|
|
281
|
-
|
|
312
|
+
filters: { version: 2 },
|
|
313
|
+
maxSubResults: 3
|
|
282
314
|
});
|
|
283
|
-
|
|
284
|
-
for (const result of response.results) {
|
|
285
|
-
console.log(result.url, result.title, result.score);
|
|
286
|
-
if (result.chunks) {
|
|
287
|
-
for (const chunk of result.chunks) {
|
|
288
|
-
console.log(" ", chunk.sectionTitle, chunk.score);
|
|
289
|
-
}
|
|
290
|
-
}
|
|
291
|
-
}
|
|
292
315
|
```
|
|
293
316
|
|
|
294
|
-
|
|
295
|
-
|
|
296
|
-
SearchSocket uses **Turso** (libSQL) as its single vector backend, providing a unified experience across development and production.
|
|
297
|
-
|
|
298
|
-
### Local Development
|
|
299
|
-
|
|
300
|
-
By default, SearchSocket uses a **local file database**:
|
|
301
|
-
- Path: `.searchsocket/vectors.db` (configurable)
|
|
302
|
-
- No account or API keys needed
|
|
303
|
-
- Full vector search with `libsql_vector_idx` and `vector_top_k`
|
|
304
|
-
- Perfect for local development and CI testing
|
|
305
|
-
|
|
306
|
-
### Production (Remote Turso)
|
|
307
|
-
|
|
308
|
-
For production, switch to **Turso's hosted service**:
|
|
309
|
-
|
|
310
|
-
1. **Sign up for Turso** (free tier available):
|
|
311
|
-
```bash
|
|
312
|
-
# Install Turso CLI
|
|
313
|
-
brew install tursodatabase/tap/turso
|
|
314
|
-
|
|
315
|
-
# Sign up
|
|
316
|
-
turso auth signup
|
|
317
|
+
### `buildResultUrl(result)`
|
|
317
318
|
|
|
318
|
-
|
|
319
|
-
turso db create searchsocket-prod
|
|
319
|
+
Builds a URL from a search result that includes scroll-to-text metadata:
|
|
320
320
|
|
|
321
|
-
|
|
322
|
-
|
|
323
|
-
|
|
324
|
-
```
|
|
325
|
-
|
|
326
|
-
2. **Set environment variables**:
|
|
327
|
-
```bash
|
|
328
|
-
TURSO_DATABASE_URL=libsql://searchsocket-prod-xxx.turso.io
|
|
329
|
-
TURSO_AUTH_TOKEN=eyJhbGc...
|
|
330
|
-
```
|
|
331
|
-
|
|
332
|
-
3. **Index normally** — SearchSocket auto-detects the remote URL and uses it.
|
|
333
|
-
|
|
334
|
-
### Direct Credential Passing
|
|
335
|
-
|
|
336
|
-
Instead of environment variables, you can pass credentials directly in the config. This is useful for serverless deployments or multi-tenant setups:
|
|
321
|
+
- `_ssk` query parameter — section title for SvelteKit client-side navigation
|
|
322
|
+
- `_sskt` query parameter — text target snippet for precise scroll
|
|
323
|
+
- `#:~:text=` — [Text Fragment](https://developer.mozilla.org/en-US/docs/Web/URI/Fragment/Text_fragments) for native browser scroll on full page loads
|
|
337
324
|
|
|
338
325
|
```ts
|
|
339
|
-
|
|
340
|
-
embeddings: {
|
|
341
|
-
apiKey: "jina_..." // direct API key (takes precedence over apiKeyEnv)
|
|
342
|
-
},
|
|
343
|
-
vector: {
|
|
344
|
-
turso: {
|
|
345
|
-
url: "libsql://my-db.turso.io", // direct URL
|
|
346
|
-
authToken: "eyJhbGc..." // direct auth token
|
|
347
|
-
}
|
|
348
|
-
}
|
|
349
|
-
};
|
|
350
|
-
```
|
|
326
|
+
import { buildResultUrl } from "searchsocket/client";
|
|
351
327
|
|
|
352
|
-
|
|
353
|
-
|
|
354
|
-
|
|
355
|
-
|
|
356
|
-
When switching embedding models (e.g., from a 1536-dim model to Jina's 1024-dim), the vector dimension changes. SearchSocket automatically detects this and recreates the chunks table with the new dimension — no manual intervention needed. A full re-index (`--force`) is still required after switching models.
|
|
328
|
+
const href = buildResultUrl(result);
|
|
329
|
+
// "/docs/getting-started?_ssk=Installation&_sskt=Install+with+pnpm#:~:text=Install%20with%20pnpm"
|
|
330
|
+
```
|
|
357
331
|
|
|
358
|
-
|
|
332
|
+
## Svelte 5 Integration
|
|
333
|
+
|
|
334
|
+
### `createSearch(options?)`
|
|
335
|
+
|
|
336
|
+
A reactive search store built on Svelte 5 runes with debouncing and LRU caching.
|
|
337
|
+
|
|
338
|
+
```svelte
|
|
339
|
+
<script>
|
|
340
|
+
import { createSearch } from "searchsocket/svelte";
|
|
341
|
+
import { buildResultUrl } from "searchsocket/client";
|
|
342
|
+
|
|
343
|
+
const search = createSearch({
|
|
344
|
+
endpoint: "/api/search",
|
|
345
|
+
debounce: 250, // ms (default)
|
|
346
|
+
cache: true, // LRU result caching (default)
|
|
347
|
+
cacheSize: 50, // max cached queries (default)
|
|
348
|
+
topK: 10,
|
|
349
|
+
groupBy: "page",
|
|
350
|
+
pathPrefix: "/docs" // scope search to a section
|
|
351
|
+
});
|
|
352
|
+
</script>
|
|
353
|
+
|
|
354
|
+
<input bind:value={search.query} placeholder="Search docs..." />
|
|
355
|
+
|
|
356
|
+
{#if search.loading}
|
|
357
|
+
<p>Searching...</p>
|
|
358
|
+
{/if}
|
|
359
|
+
|
|
360
|
+
{#if search.error}
|
|
361
|
+
<p class="error">{search.error.message}</p>
|
|
362
|
+
{/if}
|
|
363
|
+
|
|
364
|
+
{#each search.results as result}
|
|
365
|
+
<a href={buildResultUrl(result)}>
|
|
366
|
+
<strong>{result.title}</strong>
|
|
367
|
+
{#if result.sectionTitle}
|
|
368
|
+
<span>— {result.sectionTitle}</span>
|
|
369
|
+
{/if}
|
|
370
|
+
</a>
|
|
371
|
+
<p>{result.snippet}</p>
|
|
372
|
+
{/each}
|
|
373
|
+
```
|
|
359
374
|
|
|
360
|
-
|
|
361
|
-
- **Local-first development** — zero external dependencies for local dev
|
|
362
|
-
- **Production-ready** — same codebase scales to remote hosted DB
|
|
363
|
-
- **Cost-effective** — Turso free tier includes 9GB storage, 500M row reads/month
|
|
364
|
-
- **Vector search native** — `F32_BLOB` vectors, cosine similarity index, `vector_top_k` ANN queries
|
|
375
|
+
Call `search.destroy()` to clean up when no longer needed (automatic in component context).
|
|
365
376
|
|
|
366
|
-
|
|
377
|
+
### `<SearchSocket>` component
|
|
367
378
|
|
|
368
|
-
|
|
379
|
+
Declarative meta tag component for controlling per-page search behavior:
|
|
369
380
|
|
|
370
|
-
|
|
381
|
+
```svelte
|
|
382
|
+
<script>
|
|
383
|
+
import { SearchSocket } from "searchsocket/svelte";
|
|
384
|
+
</script>
|
|
371
385
|
|
|
372
|
-
|
|
386
|
+
<!-- Boost this page's search ranking -->
|
|
387
|
+
<SearchSocket weight={1.2} />
|
|
373
388
|
|
|
374
|
-
|
|
389
|
+
<!-- Exclude from search -->
|
|
390
|
+
<SearchSocket noindex />
|
|
375
391
|
|
|
376
|
-
|
|
377
|
-
|
|
378
|
-
import { searchsocketHandle } from "searchsocket/sveltekit";
|
|
392
|
+
<!-- Add filterable tags -->
|
|
393
|
+
<SearchSocket tags={["guide", "advanced"]} />
|
|
379
394
|
|
|
380
|
-
|
|
381
|
-
|
|
382
|
-
project: { id: "my-docs-site" },
|
|
383
|
-
source: { mode: "static-output" },
|
|
384
|
-
embeddings: { apiKeyEnv: "JINA_API_KEY" },
|
|
385
|
-
}
|
|
386
|
-
});
|
|
395
|
+
<!-- Add structured metadata (filterable via search API) -->
|
|
396
|
+
<SearchSocket meta={{ version: 2, category: "api" }} />
|
|
387
397
|
```
|
|
388
398
|
|
|
389
|
-
|
|
390
|
-
- `JINA_API_KEY`
|
|
391
|
-
- `TURSO_DATABASE_URL`
|
|
392
|
-
- `TURSO_AUTH_TOKEN`
|
|
399
|
+
The component renders `<meta>` tags in `<svelte:head>` that SearchSocket reads during indexing.
|
|
393
400
|
|
|
394
|
-
###
|
|
401
|
+
### Template components
|
|
395
402
|
|
|
396
|
-
|
|
403
|
+
Copy ready-made search UI components into your project:
|
|
397
404
|
|
|
398
|
-
|
|
405
|
+
```bash
|
|
406
|
+
pnpm searchsocket add search-dialog
|
|
407
|
+
pnpm searchsocket add search-input
|
|
408
|
+
pnpm searchsocket add search-results
|
|
409
|
+
```
|
|
399
410
|
|
|
400
|
-
|
|
401
|
-
- `ensureStateDirs` — creates `.searchsocket/` state directories
|
|
402
|
-
- Markdown mirror — writes `.searchsocket/mirror/` files
|
|
403
|
-
- Local SQLite fallback — only needed when `TURSO_DATABASE_URL` is not set
|
|
411
|
+
These are Svelte 5 components copied to `src/lib/components/search/` (configurable via `--dir`). They're starting points to customize, not dependencies.
|
|
404
412
|
|
|
405
|
-
|
|
413
|
+
## Scroll-to-Text Navigation
|
|
406
414
|
|
|
407
|
-
|
|
408
|
-
|----------|---------|-------|
|
|
409
|
-
| Vercel | `adapter-auto` (default) | Serverless — use `rawConfig` + remote Turso |
|
|
410
|
-
| Netlify | `adapter-netlify` | Serverless — same as Vercel |
|
|
411
|
-
| VPS / Docker | `adapter-node` | Long-lived process — no limitations, local SQLite works |
|
|
415
|
+
When a user clicks a search result, SearchSocket scrolls them to the matching section on the destination page.
|
|
412
416
|
|
|
413
|
-
|
|
417
|
+
### Setup
|
|
414
418
|
|
|
415
|
-
|
|
419
|
+
Add the scroll handler to your root layout:
|
|
416
420
|
|
|
417
|
-
|
|
421
|
+
```svelte
|
|
422
|
+
<!-- src/routes/+layout.svelte -->
|
|
423
|
+
<script>
|
|
424
|
+
import { afterNavigate } from '$app/navigation';
|
|
425
|
+
import { searchsocketScrollToText } from 'searchsocket/sveltekit';
|
|
418
426
|
|
|
419
|
-
|
|
420
|
-
|
|
421
|
-
|
|
422
|
-
- **Task adapters**: Uses `retrieval.passage` for indexing, `retrieval.query` for search queries (LoRA task-specific adapters for better retrieval quality)
|
|
427
|
+
afterNavigate(searchsocketScrollToText);
|
|
428
|
+
</script>
|
|
429
|
+
```
|
|
423
430
|
|
|
424
|
-
### How
|
|
431
|
+
### How it works
|
|
425
432
|
|
|
426
|
-
1.
|
|
427
|
-
2.
|
|
428
|
-
3.
|
|
429
|
-
4.
|
|
430
|
-
5.
|
|
431
|
-
6.
|
|
433
|
+
1. `buildResultUrl()` encodes the section title and text snippet into the URL
|
|
434
|
+
2. On SvelteKit client-side navigation, the `afterNavigate` hook reads `_ssk`/`_sskt` params
|
|
435
|
+
3. A TreeWalker-based text mapper finds the exact position in the DOM
|
|
436
|
+
4. The page scrolls smoothly to the match
|
|
437
|
+
5. The matching text is highlighted using the [CSS Custom Highlight API](https://developer.mozilla.org/en-US/docs/Web/API/CSS_Custom_Highlight_API) (with a DOM fallback for older browsers)
|
|
438
|
+
6. On full page loads, browsers that support Text Fragments (`#:~:text=`) handle scrolling natively
|
|
432
439
|
|
|
433
|
-
|
|
440
|
+
The highlight fades after 2 seconds. Customize with CSS:
|
|
434
441
|
|
|
435
|
-
|
|
436
|
-
|
|
437
|
-
|
|
442
|
+
```css
|
|
443
|
+
::highlight(ssk-highlight) {
|
|
444
|
+
background-color: rgba(250, 204, 21, 0.4);
|
|
445
|
+
}
|
|
438
446
|
```
|
|
439
447
|
|
|
440
|
-
|
|
441
|
-
```
|
|
442
|
-
pages processed: 42
|
|
443
|
-
chunks total: 156
|
|
444
|
-
chunks changed: 156
|
|
445
|
-
embeddings created: 156
|
|
446
|
-
estimated tokens: 32,400
|
|
447
|
-
estimated cost (USD): $0.000648
|
|
448
|
-
```
|
|
448
|
+
## Search & Ranking
|
|
449
449
|
|
|
450
|
-
###
|
|
450
|
+
### Dual search
|
|
451
451
|
|
|
452
|
-
|
|
452
|
+
By default, SearchSocket runs two parallel queries — one against page-level summaries and one against individual chunks — then blends the scores:
|
|
453
453
|
|
|
454
454
|
```ts
|
|
455
455
|
export default {
|
|
456
|
-
|
|
457
|
-
|
|
456
|
+
search: {
|
|
457
|
+
dualSearch: true, // default
|
|
458
|
+
pageSearchWeight: 0.3 // weight of page results vs chunks (0-1)
|
|
459
|
+
}
|
|
458
460
|
};
|
|
459
461
|
```
|
|
460
462
|
|
|
461
|
-
|
|
462
|
-
|
|
463
|
-
## Search & Ranking
|
|
464
|
-
|
|
465
|
-
### Page Aggregation
|
|
463
|
+
### Page aggregation
|
|
466
464
|
|
|
467
|
-
|
|
465
|
+
With `groupBy: "page"` (default), chunk results are grouped by page URL:
|
|
468
466
|
|
|
469
467
|
1. The top chunk score becomes the base page score
|
|
470
|
-
2. Additional matching chunks
|
|
471
|
-
3.
|
|
468
|
+
2. Additional matching chunks add a decaying bonus: `chunk_score * decay^i`
|
|
469
|
+
3. Per-URL page weights are applied multiplicatively
|
|
472
470
|
|
|
473
|
-
|
|
471
|
+
### Ranking configuration
|
|
474
472
|
|
|
475
473
|
```ts
|
|
476
474
|
export default {
|
|
477
475
|
ranking: {
|
|
478
|
-
|
|
479
|
-
|
|
480
|
-
|
|
481
|
-
|
|
482
|
-
|
|
483
|
-
|
|
476
|
+
enableIncomingLinkBoost: true, // boost pages with more internal links pointing to them
|
|
477
|
+
enableDepthBoost: true, // boost shallower pages (/ > /docs > /docs/api)
|
|
478
|
+
enableFreshnessBoost: false, // boost recently published content
|
|
479
|
+
enableAnchorTextBoost: false, // boost pages whose link text matches the query
|
|
480
|
+
|
|
481
|
+
pageWeights: { // per-URL score multipliers (prefix matching)
|
|
482
|
+
"/": 0.95,
|
|
484
483
|
"/docs": 1.15,
|
|
485
|
-
"/download": 1.
|
|
484
|
+
"/download": 1.05
|
|
486
485
|
},
|
|
486
|
+
|
|
487
|
+
aggregationCap: 5, // max chunks contributing to page score
|
|
488
|
+
aggregationDecay: 0.5, // decay for additional chunks
|
|
489
|
+
minScoreRatio: 0.70, // drop results below 70% of best score
|
|
490
|
+
scoreGapThreshold: 0.4, // trim results >40% below best
|
|
491
|
+
minChunkScoreRatio: 0.5, // threshold for sub-chunks
|
|
492
|
+
|
|
487
493
|
weights: {
|
|
488
|
-
|
|
489
|
-
|
|
490
|
-
|
|
491
|
-
|
|
494
|
+
incomingLinks: 0.05,
|
|
495
|
+
depth: 0.03,
|
|
496
|
+
aggregation: 0.1,
|
|
497
|
+
titleMatch: 0.15,
|
|
498
|
+
freshness: 0.1,
|
|
499
|
+
anchorText: 0.10
|
|
492
500
|
}
|
|
493
501
|
}
|
|
494
502
|
};
|
|
495
503
|
```
|
|
496
504
|
|
|
497
|
-
|
|
498
|
-
|
|
499
|
-
`minScore` filters out low-relevance results before they reach the client. Set to a value like `0.3` to remove noise. In page mode, pages below the threshold are dropped; in chunk mode, individual chunks are filtered. Default is `0` (disabled).
|
|
500
|
-
|
|
501
|
-
### Chunk Mode
|
|
502
|
-
|
|
503
|
-
Use `groupBy: "chunk"` for flat per-chunk results without page aggregation:
|
|
504
|
-
|
|
505
|
-
```bash
|
|
506
|
-
curl -X POST http://localhost:5173/api/search \
|
|
507
|
-
-H "content-type: application/json" \
|
|
508
|
-
-d '{"q":"vector search","topK":10,"groupBy":"chunk"}'
|
|
509
|
-
```
|
|
505
|
+
Use gentle `pageWeights` values (0.9–1.2) since they compound with other boosts.
|
|
510
506
|
|
|
511
507
|
## Build-Triggered Indexing
|
|
512
508
|
|
|
513
|
-
|
|
509
|
+
The recommended workflow is to index automatically on every deploy. Add the Vite plugin to your config:
|
|
514
510
|
|
|
515
|
-
**`vite.config.ts` or `svelte.config.js`:**
|
|
516
511
|
```ts
|
|
512
|
+
// vite.config.ts
|
|
513
|
+
import { sveltekit } from "@sveltejs/kit/vite";
|
|
517
514
|
import { searchsocketVitePlugin } from "searchsocket/sveltekit";
|
|
518
515
|
|
|
519
516
|
export default {
|
|
520
517
|
plugins: [
|
|
521
|
-
|
|
518
|
+
sveltekit(),
|
|
522
519
|
searchsocketVitePlugin({
|
|
523
|
-
|
|
524
|
-
|
|
525
|
-
verbose: false
|
|
520
|
+
changedOnly: true, // incremental indexing (default)
|
|
521
|
+
verbose: true
|
|
526
522
|
})
|
|
527
523
|
]
|
|
528
524
|
};
|
|
529
525
|
```
|
|
530
526
|
|
|
531
|
-
|
|
527
|
+
### Vercel / Cloudflare / Netlify
|
|
528
|
+
|
|
529
|
+
Set these environment variables in your hosting platform:
|
|
530
|
+
|
|
531
|
+
| Variable | Value |
|
|
532
|
+
|----------|-------|
|
|
533
|
+
| `UPSTASH_VECTOR_REST_URL` | Your Upstash Vector REST URL |
|
|
534
|
+
| `UPSTASH_VECTOR_REST_TOKEN` | Your Upstash Vector REST token |
|
|
535
|
+
| `SEARCHSOCKET_AUTO_INDEX` | `1` |
|
|
536
|
+
|
|
537
|
+
Every deploy will build your site, index the content into Upstash, and serve the search API and MCP endpoint — fully automated.
|
|
538
|
+
|
|
539
|
+
### Environment variable control
|
|
540
|
+
|
|
532
541
|
```bash
|
|
533
|
-
# Enable
|
|
542
|
+
# Enable indexing on build
|
|
534
543
|
SEARCHSOCKET_AUTO_INDEX=1 pnpm build
|
|
535
544
|
|
|
536
|
-
# Disable
|
|
545
|
+
# Disable temporarily
|
|
537
546
|
SEARCHSOCKET_DISABLE_AUTO_INDEX=1 pnpm build
|
|
547
|
+
|
|
548
|
+
# Force full rebuild (ignore incremental cache)
|
|
549
|
+
SEARCHSOCKET_FORCE_REINDEX=1 pnpm build
|
|
550
|
+
```
|
|
551
|
+
|
|
552
|
+
## Making Images Searchable
|
|
553
|
+
|
|
554
|
+
SearchSocket converts images to text during extraction using this priority chain:
|
|
555
|
+
|
|
556
|
+
1. `data-search-description` on the `<img>` — your explicit description
|
|
557
|
+
2. `data-search-description` on the parent `<figure>`
|
|
558
|
+
3. `alt` text + `<figcaption>` combined
|
|
559
|
+
4. `alt` text alone (filters generic words like "image", "icon")
|
|
560
|
+
5. `<figcaption>` alone
|
|
561
|
+
6. Removed — images with no useful text are dropped
|
|
562
|
+
|
|
563
|
+
```html
|
|
564
|
+
<img
|
|
565
|
+
src="/screenshots/settings.png"
|
|
566
|
+
alt="Settings page"
|
|
567
|
+
data-search-description="The settings page showing API key configuration, theme selection, and notification preferences"
|
|
568
|
+
/>
|
|
569
|
+
```
|
|
570
|
+
|
|
571
|
+
Works with SvelteKit's `enhanced:img`:
|
|
572
|
+
|
|
573
|
+
```svelte
|
|
574
|
+
<enhanced:img
|
|
575
|
+
src="./screenshots/dashboard.png"
|
|
576
|
+
alt="Dashboard"
|
|
577
|
+
data-search-description="Main dashboard showing active projects and indexing status"
|
|
578
|
+
/>
|
|
579
|
+
```
|
|
580
|
+
|
|
581
|
+
## MCP Server
|
|
582
|
+
|
|
583
|
+
SearchSocket includes an MCP server that gives Claude Code, Claude Desktop, and other MCP clients direct access to your site's search index. The MCP endpoint is built into `searchsocketHandle()` — once your site is deployed, any MCP client can connect to it over HTTP.
|
|
584
|
+
|
|
585
|
+
### Available tools
|
|
586
|
+
|
|
587
|
+
| Tool | Description |
|
|
588
|
+
|------|-------------|
|
|
589
|
+
| `search` | Semantic search with filtering, grouping, and reranking |
|
|
590
|
+
| `get_page` | Retrieve full page markdown with frontmatter |
|
|
591
|
+
| `list_pages` | Cursor-paginated page listing |
|
|
592
|
+
| `get_site_structure` | Hierarchical page tree |
|
|
593
|
+
| `find_source_file` | Locate the SvelteKit source file for content |
|
|
594
|
+
| `get_related_pages` | Find related pages by links, semantics, and structure |
|
|
595
|
+
|
|
596
|
+
### Connecting to your deployed site
|
|
597
|
+
|
|
598
|
+
The recommended setup is to connect Claude Code to your deployed site's MCP endpoint. This way the index stays up to date automatically as you deploy, and there's no local process to manage.
|
|
599
|
+
|
|
600
|
+
Add `.mcp.json` to your project root:
|
|
601
|
+
|
|
602
|
+
```json
|
|
603
|
+
{
|
|
604
|
+
"mcpServers": {
|
|
605
|
+
"searchsocket": {
|
|
606
|
+
"type": "http",
|
|
607
|
+
"url": "https://your-site.com/api/mcp"
|
|
608
|
+
}
|
|
609
|
+
}
|
|
610
|
+
}
|
|
538
611
|
```
|
|
539
612
|
|
|
540
|
-
|
|
613
|
+
That's it. Restart Claude Code and the six search tools are available. You can search your docs, retrieve page content, and find source files directly from the AI assistant.
|
|
541
614
|
|
|
542
|
-
|
|
615
|
+
To protect the endpoint, add API key authentication:
|
|
543
616
|
|
|
617
|
+
```ts
|
|
618
|
+
// src/hooks.server.ts
|
|
619
|
+
export const handle = searchsocketHandle({
|
|
620
|
+
rawConfig: {
|
|
621
|
+
mcp: {
|
|
622
|
+
handle: {
|
|
623
|
+
apiKey: process.env.SEARCHSOCKET_MCP_API_KEY
|
|
624
|
+
}
|
|
625
|
+
}
|
|
626
|
+
}
|
|
627
|
+
});
|
|
544
628
|
```
|
|
545
|
-
|
|
629
|
+
|
|
630
|
+
Then pass the key in `.mcp.json`:
|
|
631
|
+
|
|
632
|
+
```json
|
|
633
|
+
{
|
|
634
|
+
"mcpServers": {
|
|
635
|
+
"searchsocket": {
|
|
636
|
+
"type": "http",
|
|
637
|
+
"url": "https://your-site.com/api/mcp",
|
|
638
|
+
"headers": {
|
|
639
|
+
"Authorization": "Bearer ${SEARCHSOCKET_MCP_API_KEY}"
|
|
640
|
+
}
|
|
641
|
+
}
|
|
642
|
+
}
|
|
643
|
+
}
|
|
644
|
+
```
|
|
645
|
+
|
|
646
|
+
The `${SEARCHSOCKET_MCP_API_KEY}` syntax references an environment variable so you don't hardcode secrets in `.mcp.json`.
|
|
647
|
+
|
|
648
|
+
### Auto-approving in Claude Code
|
|
649
|
+
|
|
650
|
+
Skip the approval prompt each time a tool is called:
|
|
651
|
+
|
|
652
|
+
```json
|
|
653
|
+
{
|
|
654
|
+
"allowedMcpServers": [
|
|
655
|
+
{ "serverName": "searchsocket" }
|
|
656
|
+
]
|
|
657
|
+
}
|
|
546
658
|
```
|
|
547
659
|
|
|
548
|
-
|
|
660
|
+
Add this to `.claude/settings.json` in your project.
|
|
661
|
+
|
|
662
|
+
### Local development
|
|
663
|
+
|
|
664
|
+
During local development, you can point to your dev server instead:
|
|
665
|
+
|
|
666
|
+
```json
|
|
667
|
+
{
|
|
668
|
+
"mcpServers": {
|
|
669
|
+
"searchsocket": {
|
|
670
|
+
"type": "http",
|
|
671
|
+
"url": "http://localhost:5173/api/mcp"
|
|
672
|
+
}
|
|
673
|
+
}
|
|
674
|
+
}
|
|
549
675
|
```
|
|
550
|
-
|
|
676
|
+
|
|
677
|
+
### Claude Desktop
|
|
678
|
+
|
|
679
|
+
Add to `~/Library/Application Support/Claude/claude_desktop_config.json`:
|
|
680
|
+
|
|
681
|
+
```json
|
|
682
|
+
{
|
|
683
|
+
"mcpServers": {
|
|
684
|
+
"searchsocket": {
|
|
685
|
+
"command": "npx",
|
|
686
|
+
"args": ["searchsocket", "mcp"],
|
|
687
|
+
"cwd": "/path/to/your/project"
|
|
688
|
+
}
|
|
689
|
+
}
|
|
690
|
+
}
|
|
551
691
|
```
|
|
552
692
|
|
|
553
|
-
|
|
554
|
-
- Frontmatter: URL, title, scope, route file, metadata
|
|
555
|
-
- Markdown: Extracted content
|
|
693
|
+
### Standalone HTTP server
|
|
556
694
|
|
|
557
|
-
|
|
558
|
-
- Content workflows (edit markdown, regenerate embeddings)
|
|
559
|
-
- Version control for indexed content
|
|
560
|
-
- Debugging (see exactly what was indexed)
|
|
561
|
-
- Offline search (grep the mirror)
|
|
695
|
+
Run the MCP server as a standalone process (outside SvelteKit):
|
|
562
696
|
|
|
563
|
-
|
|
697
|
+
```bash
|
|
698
|
+
pnpm searchsocket mcp --transport http --port 3338
|
|
564
699
|
```
|
|
565
|
-
|
|
700
|
+
|
|
701
|
+
## llms.txt Generation
|
|
702
|
+
|
|
703
|
+
Generate [llms.txt](https://llmstxt.org/) files during indexing — a standardized way to make your site content available to LLMs.
|
|
704
|
+
|
|
705
|
+
```ts
|
|
706
|
+
export default {
|
|
707
|
+
project: {
|
|
708
|
+
baseUrl: "https://example.com"
|
|
709
|
+
},
|
|
710
|
+
llmsTxt: {
|
|
711
|
+
enable: true,
|
|
712
|
+
title: "My Project",
|
|
713
|
+
description: "Documentation for My Project",
|
|
714
|
+
outputPath: "static/llms.txt", // default
|
|
715
|
+
generateFull: true, // also generate llms-full.txt
|
|
716
|
+
serveMarkdownVariants: false // serve /page.md variants via the hook
|
|
717
|
+
}
|
|
718
|
+
};
|
|
566
719
|
```
|
|
567
720
|
|
|
568
|
-
|
|
721
|
+
After indexing, `llms.txt` (page index with links) and `llms-full.txt` (full content) are written to your static directory and served by `searchsocketHandle()`.
|
|
722
|
+
|
|
723
|
+
## CLI Commands
|
|
569
724
|
|
|
570
725
|
### `searchsocket init`
|
|
571
726
|
|
|
572
|
-
Initialize config and state directory.
|
|
727
|
+
Initialize config and state directory. Creates `searchsocket.config.ts`, `.searchsocket/`, `.mcp.json`, and wires up your hooks and Vite config.
|
|
573
728
|
|
|
574
729
|
```bash
|
|
575
730
|
pnpm searchsocket init
|
|
731
|
+
pnpm searchsocket init --non-interactive
|
|
576
732
|
```
|
|
577
733
|
|
|
578
734
|
### `searchsocket index`
|
|
579
735
|
|
|
580
|
-
Index content into
|
|
736
|
+
Index content into Upstash Vector.
|
|
581
737
|
|
|
582
738
|
```bash
|
|
583
|
-
#
|
|
584
|
-
pnpm searchsocket index --
|
|
739
|
+
pnpm searchsocket index # incremental (default: --changed-only)
|
|
740
|
+
pnpm searchsocket index --force # full re-index
|
|
741
|
+
pnpm searchsocket index --source build # override source mode
|
|
742
|
+
pnpm searchsocket index --scope staging # override scope
|
|
743
|
+
pnpm searchsocket index --dry-run # preview without writing
|
|
744
|
+
pnpm searchsocket index --max-pages 10 # limit for testing
|
|
745
|
+
pnpm searchsocket index --verbose # detailed output
|
|
746
|
+
pnpm searchsocket index --json # machine-readable output
|
|
747
|
+
```
|
|
585
748
|
|
|
586
|
-
|
|
587
|
-
pnpm searchsocket index --force
|
|
749
|
+
### `searchsocket search`
|
|
588
750
|
|
|
589
|
-
|
|
590
|
-
pnpm searchsocket index --dry-run
|
|
751
|
+
CLI search for testing.
|
|
591
752
|
|
|
592
|
-
|
|
593
|
-
pnpm searchsocket
|
|
753
|
+
```bash
|
|
754
|
+
pnpm searchsocket search --q "getting started" --top-k 5
|
|
755
|
+
pnpm searchsocket search --q "api" --path-prefix /docs
|
|
756
|
+
```
|
|
594
757
|
|
|
595
|
-
|
|
596
|
-
pnpm searchsocket index --max-pages 10 --max-chunks 50
|
|
758
|
+
### `searchsocket dev`
|
|
597
759
|
|
|
598
|
-
|
|
599
|
-
pnpm searchsocket index --scope staging
|
|
760
|
+
Watch for file changes and auto-reindex, with optional playground UI.
|
|
600
761
|
|
|
601
|
-
|
|
602
|
-
pnpm searchsocket
|
|
762
|
+
```bash
|
|
763
|
+
pnpm searchsocket dev # watch + playground at :3337
|
|
764
|
+
pnpm searchsocket dev --mcp --mcp-port 3338 # also start MCP HTTP server
|
|
765
|
+
pnpm searchsocket dev --no-playground # watch only
|
|
603
766
|
```
|
|
604
767
|
|
|
605
768
|
### `searchsocket status`
|
|
606
769
|
|
|
607
|
-
Show indexing status
|
|
770
|
+
Show indexing status and backend health.
|
|
608
771
|
|
|
609
772
|
```bash
|
|
610
773
|
pnpm searchsocket status
|
|
774
|
+
```
|
|
775
|
+
|
|
776
|
+
### `searchsocket doctor`
|
|
777
|
+
|
|
778
|
+
Validate config, env vars, provider connectivity, and write access.
|
|
611
779
|
|
|
612
|
-
|
|
613
|
-
|
|
614
|
-
# resolved scope: main
|
|
615
|
-
# embedding model: jina-embeddings-v5-text-small
|
|
616
|
-
# vector backend: turso/libsql (local (.searchsocket/vectors.db))
|
|
617
|
-
# vector health: ok
|
|
618
|
-
# last indexed (main): 2025-02-23T10:30:00Z
|
|
619
|
-
# tracked chunks: 156
|
|
620
|
-
# last estimated tokens: 32,400
|
|
621
|
-
# last estimated cost: $0.000648
|
|
780
|
+
```bash
|
|
781
|
+
pnpm searchsocket doctor
|
|
622
782
|
```
|
|
623
783
|
|
|
624
|
-
### `searchsocket
|
|
784
|
+
### `searchsocket test`
|
|
625
785
|
|
|
626
|
-
|
|
786
|
+
Run search quality assertions against the live index.
|
|
627
787
|
|
|
628
788
|
```bash
|
|
629
|
-
pnpm searchsocket
|
|
789
|
+
pnpm searchsocket test # uses searchsocket.test.json
|
|
790
|
+
pnpm searchsocket test --file custom-tests.json # custom test file
|
|
791
|
+
```
|
|
792
|
+
|
|
793
|
+
Test file format:
|
|
630
794
|
|
|
631
|
-
|
|
632
|
-
|
|
795
|
+
```json
|
|
796
|
+
[
|
|
797
|
+
{
|
|
798
|
+
"query": "installation guide",
|
|
799
|
+
"expect": {
|
|
800
|
+
"topResult": "/docs/getting-started",
|
|
801
|
+
"inTop5": ["/docs/getting-started", "/docs/quickstart"]
|
|
802
|
+
}
|
|
803
|
+
}
|
|
804
|
+
]
|
|
633
805
|
```
|
|
634
806
|
|
|
635
|
-
|
|
636
|
-
- `src/routes/**` (route files)
|
|
637
|
-
- `build/` (if static-output mode)
|
|
638
|
-
- Build output dir (if build mode)
|
|
639
|
-
- Content files (if content-files mode)
|
|
640
|
-
- `searchsocket.config.ts` (if crawl or build mode)
|
|
807
|
+
Reports pass/fail per assertion and Mean Reciprocal Rank (MRR) across all queries.
|
|
641
808
|
|
|
642
809
|
### `searchsocket clean`
|
|
643
810
|
|
|
644
|
-
Delete local state and optionally remote
|
|
811
|
+
Delete local state and optionally remote indexes.
|
|
645
812
|
|
|
646
813
|
```bash
|
|
647
|
-
#
|
|
648
|
-
pnpm searchsocket clean
|
|
649
|
-
|
|
650
|
-
# Local + remote vectors
|
|
651
|
-
pnpm searchsocket clean --remote --scope staging
|
|
814
|
+
pnpm searchsocket clean # local state only
|
|
815
|
+
pnpm searchsocket clean --remote # also delete remote scope
|
|
816
|
+
pnpm searchsocket clean --scope staging # specific scope
|
|
652
817
|
```
|
|
653
818
|
|
|
654
819
|
### `searchsocket prune`
|
|
655
820
|
|
|
656
|
-
|
|
821
|
+
List and delete stale scopes. Compares against git branches to find orphaned scopes.
|
|
657
822
|
|
|
658
823
|
```bash
|
|
659
|
-
#
|
|
660
|
-
pnpm searchsocket prune --
|
|
824
|
+
pnpm searchsocket prune # dry-run (default)
|
|
825
|
+
pnpm searchsocket prune --apply # actually delete
|
|
826
|
+
pnpm searchsocket prune --older-than 30d # only scopes older than 30 days
|
|
827
|
+
```
|
|
661
828
|
|
|
662
|
-
|
|
663
|
-
|
|
829
|
+
### `searchsocket mcp`
|
|
830
|
+
|
|
831
|
+
Run the MCP server standalone.
|
|
664
832
|
|
|
665
|
-
|
|
666
|
-
pnpm searchsocket
|
|
833
|
+
```bash
|
|
834
|
+
pnpm searchsocket mcp # stdio (default)
|
|
835
|
+
pnpm searchsocket mcp --transport http --port 3338 # HTTP
|
|
836
|
+
pnpm searchsocket mcp --access public --api-key SECRET # public with auth
|
|
667
837
|
```
|
|
668
838
|
|
|
669
|
-
### `searchsocket
|
|
839
|
+
### `searchsocket add`
|
|
670
840
|
|
|
671
|
-
|
|
841
|
+
Copy Svelte 5 search UI template components into your project.
|
|
672
842
|
|
|
673
843
|
```bash
|
|
674
|
-
pnpm searchsocket
|
|
675
|
-
|
|
676
|
-
|
|
677
|
-
#
|
|
678
|
-
# PASS env JINA_API_KEY
|
|
679
|
-
# PASS turso/libsql (local file: .searchsocket/vectors.db)
|
|
680
|
-
# PASS source: build manifest
|
|
681
|
-
# PASS source: vite binary
|
|
682
|
-
# PASS embedding provider connectivity
|
|
683
|
-
# PASS vector backend connectivity
|
|
684
|
-
# PASS vector backend write permission
|
|
685
|
-
# PASS state directory writable
|
|
844
|
+
pnpm searchsocket add search-dialog
|
|
845
|
+
pnpm searchsocket add search-input
|
|
846
|
+
pnpm searchsocket add search-results
|
|
847
|
+
pnpm searchsocket add search-dialog --dir src/lib/components/ui # custom dir
|
|
686
848
|
```
|
|
687
849
|
|
|
688
|
-
|
|
850
|
+
## Real-World Example
|
|
689
851
|
|
|
690
|
-
|
|
852
|
+
Here's how [Canopy](https://canopy.dev) integrates SearchSocket into a production SvelteKit site.
|
|
691
853
|
|
|
692
|
-
|
|
693
|
-
# stdio transport (default)
|
|
694
|
-
pnpm searchsocket mcp
|
|
854
|
+
### Configuration
|
|
695
855
|
|
|
696
|
-
|
|
697
|
-
|
|
856
|
+
```ts
|
|
857
|
+
// searchsocket.config.ts
|
|
858
|
+
export default {
|
|
859
|
+
project: {
|
|
860
|
+
id: "canopy-website",
|
|
861
|
+
baseUrl: "https://canopy.dev"
|
|
862
|
+
},
|
|
863
|
+
source: {
|
|
864
|
+
mode: "build"
|
|
865
|
+
},
|
|
866
|
+
extract: {
|
|
867
|
+
dropSelectors: [".nav-blur", ".mobile-overlay", ".docs-sidebar"]
|
|
868
|
+
},
|
|
869
|
+
ranking: {
|
|
870
|
+
minScoreRatio: 0.70,
|
|
871
|
+
pageWeights: {
|
|
872
|
+
"/": 0.95,
|
|
873
|
+
"/download": 1.05,
|
|
874
|
+
"/docs/**": 1.05
|
|
875
|
+
},
|
|
876
|
+
aggregationCap: 3,
|
|
877
|
+
aggregationDecay: 0.3
|
|
878
|
+
}
|
|
879
|
+
};
|
|
698
880
|
```
|
|
699
881
|
|
|
700
|
-
###
|
|
882
|
+
### Server hook
|
|
701
883
|
|
|
702
|
-
|
|
884
|
+
```ts
|
|
885
|
+
// src/hooks.server.ts
|
|
886
|
+
import { searchsocketHandle } from "searchsocket/sveltekit";
|
|
887
|
+
import { env } from "$env/dynamic/private";
|
|
703
888
|
|
|
704
|
-
|
|
705
|
-
|
|
889
|
+
export const handle = searchsocketHandle({
|
|
890
|
+
rawConfig: {
|
|
891
|
+
project: { id: "canopy-website", baseUrl: "https://canopy.dev" },
|
|
892
|
+
source: { mode: "build" },
|
|
893
|
+
upstash: {
|
|
894
|
+
url: env.UPSTASH_VECTOR_REST_URL,
|
|
895
|
+
token: env.UPSTASH_VECTOR_REST_TOKEN
|
|
896
|
+
},
|
|
897
|
+
extract: {
|
|
898
|
+
dropSelectors: [".nav-blur", ".mobile-overlay", ".docs-sidebar"]
|
|
899
|
+
},
|
|
900
|
+
ranking: {
|
|
901
|
+
minScoreRatio: 0.70,
|
|
902
|
+
pageWeights: { "/": 0.95, "/download": 1.05, "/docs/**": 1.05 },
|
|
903
|
+
aggregationCap: 3,
|
|
904
|
+
aggregationDecay: 0.3
|
|
905
|
+
}
|
|
906
|
+
}
|
|
907
|
+
});
|
|
908
|
+
```
|
|
909
|
+
|
|
910
|
+
### Search modal with scoped search
|
|
911
|
+
|
|
912
|
+
```svelte
|
|
913
|
+
<!-- SearchModal.svelte -->
|
|
914
|
+
<script>
|
|
915
|
+
import { createSearchClient, buildResultUrl } from "searchsocket/client";
|
|
916
|
+
|
|
917
|
+
let { open = $bindable(false), pathPrefix = "", placeholder = "Search..." } = $props();
|
|
918
|
+
|
|
919
|
+
const client = createSearchClient();
|
|
920
|
+
let query = $state("");
|
|
921
|
+
let results = $state([]);
|
|
922
|
+
|
|
923
|
+
async function doSearch() {
|
|
924
|
+
if (!query.trim()) { results = []; return; }
|
|
925
|
+
const res = await client.search({
|
|
926
|
+
q: query,
|
|
927
|
+
topK: 8,
|
|
928
|
+
groupBy: "page",
|
|
929
|
+
pathPrefix: pathPrefix || undefined
|
|
930
|
+
});
|
|
931
|
+
results = res.results;
|
|
932
|
+
}
|
|
933
|
+
</script>
|
|
934
|
+
|
|
935
|
+
{#if open}
|
|
936
|
+
<dialog open>
|
|
937
|
+
<input bind:value={query} oninput={doSearch} {placeholder} />
|
|
938
|
+
{#each results as result}
|
|
939
|
+
<a href={buildResultUrl(result)} onclick={() => open = false}>
|
|
940
|
+
<strong>{result.title}</strong>
|
|
941
|
+
{#if result.sectionTitle}<span>— {result.sectionTitle}</span>{/if}
|
|
942
|
+
<p>{result.snippet}</p>
|
|
943
|
+
</a>
|
|
944
|
+
{/each}
|
|
945
|
+
</dialog>
|
|
946
|
+
{/if}
|
|
706
947
|
```
|
|
707
948
|
|
|
708
|
-
|
|
949
|
+
### Scroll-to-text in layout
|
|
709
950
|
|
|
710
|
-
|
|
951
|
+
```svelte
|
|
952
|
+
<!-- src/routes/+layout.svelte -->
|
|
953
|
+
<script>
|
|
954
|
+
import { afterNavigate } from "$app/navigation";
|
|
955
|
+
import { searchsocketScrollToText } from "searchsocket/sveltekit";
|
|
711
956
|
|
|
712
|
-
|
|
957
|
+
afterNavigate(searchsocketScrollToText);
|
|
958
|
+
</script>
|
|
959
|
+
```
|
|
713
960
|
|
|
714
|
-
|
|
715
|
-
- Semantic search across indexed content
|
|
716
|
-
- Returns ranked results with URL, title, snippet, score, and routeFile
|
|
717
|
-
- Options: `scope`, `topK` (1-100), `pathPrefix`, `tags`, `groupBy` (`"page"` | `"chunk"`)
|
|
961
|
+
### Deploy and index
|
|
718
962
|
|
|
719
|
-
|
|
720
|
-
- Retrieve full indexed page content as markdown with frontmatter
|
|
721
|
-
- Options: `scope`
|
|
963
|
+
Indexing runs automatically on every Vercel deploy. Set these env vars in the Vercel dashboard:
|
|
722
964
|
|
|
723
|
-
|
|
965
|
+
- `UPSTASH_VECTOR_REST_URL`
|
|
966
|
+
- `UPSTASH_VECTOR_REST_TOKEN`
|
|
967
|
+
- `SEARCHSOCKET_AUTO_INDEX=1`
|
|
724
968
|
|
|
725
|
-
|
|
969
|
+
The Vite plugin handles the rest. Alternatively, use a postbuild script:
|
|
726
970
|
|
|
727
971
|
```json
|
|
728
972
|
{
|
|
729
|
-
"
|
|
730
|
-
"
|
|
731
|
-
|
|
732
|
-
"command": "npx",
|
|
733
|
-
"args": ["searchsocket", "mcp"],
|
|
734
|
-
"env": {}
|
|
735
|
-
}
|
|
973
|
+
"scripts": {
|
|
974
|
+
"build": "vite build",
|
|
975
|
+
"postbuild": "searchsocket index"
|
|
736
976
|
}
|
|
737
977
|
}
|
|
738
978
|
```
|
|
739
979
|
|
|
740
|
-
|
|
741
|
-
|
|
742
|
-
```bash
|
|
743
|
-
claude mcp list
|
|
744
|
-
```
|
|
745
|
-
|
|
746
|
-
### Setup (Claude Desktop)
|
|
747
|
-
|
|
748
|
-
Add to `~/Library/Application Support/Claude/claude_desktop_config.json`:
|
|
980
|
+
### Connect Claude Code to the deployed site
|
|
749
981
|
|
|
750
982
|
```json
|
|
751
983
|
{
|
|
752
984
|
"mcpServers": {
|
|
753
985
|
"searchsocket": {
|
|
754
|
-
"
|
|
755
|
-
"
|
|
756
|
-
"cwd": "/path/to/your/project"
|
|
986
|
+
"type": "http",
|
|
987
|
+
"url": "https://canopy.dev/api/mcp"
|
|
757
988
|
}
|
|
758
989
|
}
|
|
759
990
|
}
|
|
760
991
|
```
|
|
761
992
|
|
|
762
|
-
|
|
993
|
+
Now Claude Code can search the live docs, retrieve page content, and find source files — all backed by the production index that stays current with every deploy.
|
|
763
994
|
|
|
764
|
-
###
|
|
995
|
+
### Excluding pages from search
|
|
765
996
|
|
|
766
|
-
|
|
767
|
-
|
|
768
|
-
|
|
769
|
-
|
|
997
|
+
```svelte
|
|
998
|
+
<!-- src/routes/blog/+page.svelte (archive page) -->
|
|
999
|
+
<svelte:head>
|
|
1000
|
+
<meta name="searchsocket-weight" content="0" />
|
|
1001
|
+
</svelte:head>
|
|
770
1002
|
```
|
|
771
1003
|
|
|
772
|
-
|
|
1004
|
+
Or with the component:
|
|
773
1005
|
|
|
774
|
-
|
|
1006
|
+
```svelte
|
|
1007
|
+
<script>
|
|
1008
|
+
import { SearchSocket } from "searchsocket/svelte";
|
|
1009
|
+
</script>
|
|
775
1010
|
|
|
776
|
-
|
|
1011
|
+
<SearchSocket weight={0} />
|
|
1012
|
+
```
|
|
777
1013
|
|
|
778
|
-
###
|
|
1014
|
+
### Vite SSR config
|
|
1015
|
+
|
|
1016
|
+
```ts
|
|
1017
|
+
// vite.config.ts
|
|
1018
|
+
import { sveltekit } from "@sveltejs/kit/vite";
|
|
1019
|
+
import { defineConfig } from "vite";
|
|
1020
|
+
|
|
1021
|
+
export default defineConfig({
|
|
1022
|
+
plugins: [sveltekit()],
|
|
1023
|
+
ssr: {
|
|
1024
|
+
external: ["searchsocket", "searchsocket/sveltekit", "searchsocket/client"]
|
|
1025
|
+
}
|
|
1026
|
+
});
|
|
1027
|
+
```
|
|
779
1028
|
|
|
780
|
-
|
|
781
|
-
- `JINA_API_KEY` — Jina AI API key for embeddings and reranking
|
|
1029
|
+
## Environment Variables
|
|
782
1030
|
|
|
783
|
-
###
|
|
1031
|
+
### Required
|
|
784
1032
|
|
|
785
|
-
|
|
786
|
-
|
|
787
|
-
|
|
1033
|
+
| Variable | Description |
|
|
1034
|
+
|----------|-------------|
|
|
1035
|
+
| `UPSTASH_VECTOR_REST_URL` | Upstash Vector REST API endpoint |
|
|
1036
|
+
| `UPSTASH_VECTOR_REST_TOKEN` | Upstash Vector REST API token |
|
|
788
1037
|
|
|
789
|
-
|
|
1038
|
+
### Optional
|
|
790
1039
|
|
|
791
|
-
|
|
1040
|
+
| Variable | Description |
|
|
1041
|
+
|----------|-------------|
|
|
1042
|
+
| `SEARCHSOCKET_SCOPE` | Override scope (when `scope.mode: "env"`) |
|
|
1043
|
+
| `SEARCHSOCKET_AUTO_INDEX` | Enable build-triggered indexing (`1`, `true`, or `yes`) |
|
|
1044
|
+
| `SEARCHSOCKET_DISABLE_AUTO_INDEX` | Disable build-triggered indexing |
|
|
1045
|
+
| `SEARCHSOCKET_FORCE_REINDEX` | Force full re-index in CI/CD |
|
|
792
1046
|
|
|
793
|
-
|
|
794
|
-
- `SEARCHSOCKET_AUTO_INDEX` — Enable build-triggered indexing
|
|
795
|
-
- `SEARCHSOCKET_DISABLE_AUTO_INDEX` — Disable build-triggered indexing
|
|
1047
|
+
The CLI automatically loads `.env` from the working directory on startup.
|
|
796
1048
|
|
|
797
|
-
## Configuration
|
|
1049
|
+
## Configuration Reference
|
|
798
1050
|
|
|
799
|
-
|
|
1051
|
+
See [docs/config.md](docs/config.md) for the full configuration reference. Here's the full example:
|
|
800
1052
|
|
|
801
1053
|
```ts
|
|
802
1054
|
export default {
|
|
@@ -806,41 +1058,24 @@ export default {
|
|
|
806
1058
|
},
|
|
807
1059
|
|
|
808
1060
|
scope: {
|
|
809
|
-
mode: "git",
|
|
1061
|
+
mode: "git", // "fixed" | "git" | "env"
|
|
810
1062
|
fixed: "main",
|
|
811
1063
|
sanitize: true
|
|
812
1064
|
},
|
|
813
1065
|
|
|
1066
|
+
exclude: ["/admin/*", "/api/*"],
|
|
1067
|
+
respectRobotsTxt: true,
|
|
1068
|
+
|
|
814
1069
|
source: {
|
|
815
|
-
mode: "build",
|
|
1070
|
+
mode: "build",
|
|
816
1071
|
staticOutputDir: "build",
|
|
817
|
-
strictRouteMapping: false,
|
|
818
|
-
|
|
819
|
-
// Build mode (recommended for CI/CD)
|
|
820
1072
|
build: {
|
|
821
|
-
outputDir: ".svelte-kit/output",
|
|
822
|
-
previewTimeout: 30000,
|
|
823
1073
|
exclude: ["/api/*"],
|
|
824
1074
|
paramValues: {
|
|
825
1075
|
"/blog/[slug]": ["hello-world", "getting-started"]
|
|
826
1076
|
},
|
|
827
|
-
discover:
|
|
828
|
-
|
|
829
|
-
maxPages: 200,
|
|
830
|
-
maxDepth: 5
|
|
831
|
-
},
|
|
832
|
-
|
|
833
|
-
// Crawl mode (alternative)
|
|
834
|
-
crawl: {
|
|
835
|
-
baseUrl: "http://localhost:4173",
|
|
836
|
-
routes: ["/", "/docs", "/blog"],
|
|
837
|
-
sitemapUrl: "https://example.com/sitemap.xml"
|
|
838
|
-
},
|
|
839
|
-
|
|
840
|
-
// Content files mode (alternative)
|
|
841
|
-
contentFiles: {
|
|
842
|
-
globs: ["src/routes/**/*.md"],
|
|
843
|
-
baseDir: "."
|
|
1077
|
+
discover: true,
|
|
1078
|
+
maxPages: 200
|
|
844
1079
|
}
|
|
845
1080
|
},
|
|
846
1081
|
|
|
@@ -850,77 +1085,77 @@ export default {
|
|
|
850
1085
|
dropSelectors: [".sidebar", ".toc"],
|
|
851
1086
|
ignoreAttr: "data-search-ignore",
|
|
852
1087
|
noindexAttr: "data-search-noindex",
|
|
853
|
-
|
|
1088
|
+
imageDescAttr: "data-search-description"
|
|
854
1089
|
},
|
|
855
1090
|
|
|
856
1091
|
chunking: {
|
|
857
|
-
maxChars:
|
|
1092
|
+
maxChars: 1500,
|
|
858
1093
|
overlapChars: 200,
|
|
859
1094
|
minChars: 250,
|
|
860
|
-
|
|
861
|
-
|
|
862
|
-
prependTitle: true, // prepend page title to chunk text before embedding
|
|
863
|
-
pageSummaryChunk: true // generate synthetic identity chunk per page
|
|
864
|
-
},
|
|
865
|
-
|
|
866
|
-
embeddings: {
|
|
867
|
-
provider: "jina",
|
|
868
|
-
model: "jina-embeddings-v5-text-small",
|
|
869
|
-
apiKey: "jina_...", // direct API key (or use apiKeyEnv)
|
|
870
|
-
apiKeyEnv: "JINA_API_KEY",
|
|
871
|
-
batchSize: 64,
|
|
872
|
-
concurrency: 4
|
|
1095
|
+
prependTitle: true,
|
|
1096
|
+
pageSummaryChunk: true
|
|
873
1097
|
},
|
|
874
1098
|
|
|
875
|
-
|
|
876
|
-
|
|
877
|
-
|
|
878
|
-
url: "libsql://my-db.turso.io", // direct URL (or use urlEnv)
|
|
879
|
-
authToken: "eyJhbGc...", // direct token (or use authTokenEnv)
|
|
880
|
-
urlEnv: "TURSO_DATABASE_URL",
|
|
881
|
-
authTokenEnv: "TURSO_AUTH_TOKEN",
|
|
882
|
-
localPath: ".searchsocket/vectors.db"
|
|
883
|
-
}
|
|
1099
|
+
upstash: {
|
|
1100
|
+
urlEnv: "UPSTASH_VECTOR_REST_URL",
|
|
1101
|
+
tokenEnv: "UPSTASH_VECTOR_REST_TOKEN"
|
|
884
1102
|
},
|
|
885
1103
|
|
|
886
|
-
|
|
887
|
-
|
|
888
|
-
|
|
889
|
-
model: "jina-reranker-v3"
|
|
1104
|
+
search: {
|
|
1105
|
+
dualSearch: true,
|
|
1106
|
+
pageSearchWeight: 0.3
|
|
890
1107
|
},
|
|
891
1108
|
|
|
892
1109
|
ranking: {
|
|
893
1110
|
enableIncomingLinkBoost: true,
|
|
894
1111
|
enableDepthBoost: true,
|
|
895
|
-
pageWeights: {
|
|
896
|
-
|
|
897
|
-
"/docs": 1.15
|
|
898
|
-
},
|
|
899
|
-
minScore: 0,
|
|
1112
|
+
pageWeights: { "/docs": 1.15 },
|
|
1113
|
+
minScoreRatio: 0.70,
|
|
900
1114
|
aggregationCap: 5,
|
|
901
|
-
aggregationDecay: 0.5
|
|
902
|
-
minChunkScoreRatio: 0.5,
|
|
903
|
-
weights: {
|
|
904
|
-
incomingLinks: 0.05,
|
|
905
|
-
depth: 0.03,
|
|
906
|
-
rerank: 1.0,
|
|
907
|
-
aggregation: 0.1
|
|
908
|
-
}
|
|
1115
|
+
aggregationDecay: 0.5
|
|
909
1116
|
},
|
|
910
1117
|
|
|
911
1118
|
api: {
|
|
912
1119
|
path: "/api/search",
|
|
913
|
-
cors: {
|
|
914
|
-
|
|
915
|
-
|
|
916
|
-
|
|
917
|
-
|
|
918
|
-
|
|
919
|
-
|
|
1120
|
+
cors: { allowOrigins: ["https://example.com"] }
|
|
1121
|
+
},
|
|
1122
|
+
|
|
1123
|
+
mcp: {
|
|
1124
|
+
enable: true,
|
|
1125
|
+
handle: { path: "/api/mcp" }
|
|
1126
|
+
},
|
|
1127
|
+
|
|
1128
|
+
llmsTxt: {
|
|
1129
|
+
enable: true,
|
|
1130
|
+
title: "My Project",
|
|
1131
|
+
description: "Documentation for My Project"
|
|
1132
|
+
},
|
|
1133
|
+
|
|
1134
|
+
state: {
|
|
1135
|
+
dir: ".searchsocket"
|
|
920
1136
|
}
|
|
921
1137
|
};
|
|
922
1138
|
```
|
|
923
1139
|
|
|
1140
|
+
## CI/CD
|
|
1141
|
+
|
|
1142
|
+
See [docs/ci.md](docs/ci.md) for ready-to-use GitHub Actions workflows covering:
|
|
1143
|
+
|
|
1144
|
+
- Main branch indexing on push
|
|
1145
|
+
- PR dry-run validation
|
|
1146
|
+
- Preview branch scope isolation
|
|
1147
|
+
- Scheduled scope pruning
|
|
1148
|
+
- Vercel build-triggered indexing
|
|
1149
|
+
|
|
1150
|
+
## Further Reading
|
|
1151
|
+
|
|
1152
|
+
- [Building a Search UI](docs/search-ui.md) — Cmd+K modals, scoped search, styling, and API reference
|
|
1153
|
+
- [Tuning Search Relevance](docs/tuning.md) — visual playground, ranking parameters, and search quality testing
|
|
1154
|
+
- [Configuration Reference](docs/config.md) — all config options, indexing hooks, and custom records
|
|
1155
|
+
- [CI/CD Workflows](docs/ci.md) — GitHub Actions and Vercel integration
|
|
1156
|
+
- [MCP over HTTP Guide](docs/mcp-claude-code.md) — detailed HTTP MCP setup for Claude Code
|
|
1157
|
+
- [Troubleshooting](docs/troubleshooting.md) — common issues, diagnostics, and FAQ
|
|
1158
|
+
|
|
924
1159
|
## License
|
|
925
1160
|
|
|
926
1161
|
MIT
|