npm - searchsocket - Versions diffs - 0.4.0 → 0.6.0 - Mend

searchsocket 0.4.0 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (28) hide show

package/README.md +742 -507
package/dist/cli.js +3504 -1412
package/dist/client.cjs +41 -117
package/dist/client.d.cts +3 -17
package/dist/client.d.ts +3 -17
package/dist/client.js +41 -117
package/dist/index.cjs +2553 -1499
package/dist/index.d.cts +133 -34
package/dist/index.d.ts +133 -34
package/dist/index.js +2551 -1494
package/dist/plugin-C61L-ykY.d.ts +37 -0
package/dist/plugin-DoBW1gkK.d.cts +37 -0
package/dist/scroll.cjs +185 -0
package/dist/scroll.d.cts +42 -0
package/dist/scroll.d.ts +42 -0
package/dist/scroll.js +183 -0
package/dist/sveltekit.cjs +2769 -1389
package/dist/sveltekit.d.cts +3 -43
package/dist/sveltekit.d.ts +3 -43
package/dist/sveltekit.js +2769 -1389
package/dist/templates/search-dialog/SearchDialog.svelte +175 -0
package/dist/templates/search-input/SearchInput.svelte +151 -0
package/dist/templates/search-results/SearchResults.svelte +75 -0
package/dist/{types-z2dw3H6E.d.cts → types-029hl6P2.d.cts} +210 -134
package/dist/{types-z2dw3H6E.d.ts → types-029hl6P2.d.ts} +210 -134
package/package.json +28 -3
package/src/svelte/SearchSocket.svelte +35 -0
package/src/svelte/index.svelte.ts +181 -0

package/README.md CHANGED Viewed

@@ -1,34 +1,44 @@
 # SearchSocket
-Semantic site search and MCP retrieval for SvelteKit content projects.
+Semantic site search and MCP retrieval for SvelteKit content projects. Index your site, search it from the browser or AI tools, and scroll users to the exact content they're looking for.
-**Requirements**: Node.js >= 20
+**Requirements**: Node.js >= 20 | **Backend**: [Upstash Vector](https://upstash.com/docs/vector/overall/getstarted) | **License**: MIT
+## How it works
+```
+SvelteKit Pages → Extractor (Cheerio + Turndown) → Chunker → Upstash Vector
+                                                                    ↓
+                    Search UI ← SvelteKit API Hook ← Search Engine + Ranking
+                                       ↓
+                              MCP Endpoint → Claude Code / Claude Desktop
+```
+SearchSocket extracts content from your SvelteKit site, converts it to markdown, splits it into chunks, and stores them in Upstash Vector. At runtime, the SvelteKit hook serves both a search API for your frontend and an MCP endpoint for AI tools.
 ## Features
-- **Embeddings**: Jina AI `jina-embeddings-v5-text-small` with task-specific LoRA adapters (configurable)
-- **Vector Backend**: Turso/libSQL with vector search (local file DB for development, remote for production)
-- **Rerank**: Jina `jina-reranker-v3` enabled by default — same API key
-- **Page Aggregation**: Group results by page with score-weighted chunk decay
-- **Meta Extraction**: Automatically extracts `<meta name="description">` and `<meta name="keywords">` for improved relevance
-- **SvelteKit Integrations**:
-  - `searchsocketHandle()` for `POST /api/search` endpoint
-  - `searchsocketVitePlugin()` for build-triggered indexing
-- **Client Library**: `createSearchClient()` for browser-side search
-- **MCP Server**: Model Context Protocol tools for search and page retrieval
-- **Git-Tracked Markdown Mirror**: Commit-safe deterministic markdown outputs
+- **Semantic + keyword search** — Upstash Vector handles hybrid search with built-in reranking and input enrichment
+- **Dual search** — parallel page-level and chunk-level queries with configurable score blending
+- **Scroll-to-text** — auto-scroll to the matching section when a user clicks a search result, with CSS Highlight API and Text Fragment support
+- **SvelteKit integration** — server hook for the search API, Vite plugin for build-triggered indexing
+- **Svelte 5 components** — reactive `createSearch` store and `<SearchSocket>` metadata component
+- **MCP server** — six tools for Claude Code, Claude Desktop, and other MCP clients (stdio + HTTP)
+- **llms.txt generation** — auto-generate LLM-friendly site indexes during indexing
+- **Four source modes** — index from static output, build manifest, a running server, or raw markdown files
+- **CLI** — init, index, search, dev, status, doctor, clean, prune, test, mcp, add
 ## Install
 ```bash
-# pnpm
 pnpm add -D searchsocket
-# npm
-npm install -D searchsocket
 ```
-SearchSocket is typically a dev dependency for CLI indexing. If you use `searchsocketHandle()` at runtime (e.g., in a Node server adapter), add it as a regular dependency instead.
+SearchSocket is typically a dev dependency since indexing runs at build time. If you use `searchsocketHandle()` at runtime (e.g., in a Node server adapter or serving the MCP endpoint from a production deployment), add it as a regular dependency:
+```bash
+pnpm add searchsocket
+```
 ## Quickstart
@@ -38,100 +48,134 @@ SearchSocket is typically a dev dependency for CLI indexing. If you use `searchs
 pnpm searchsocket init
 ```
-This creates:
-- `searchsocket.config.ts` — minimal config file
-- `.searchsocket/` — state directory (added to `.gitignore`)
+Creates `searchsocket.config.ts`, the `.searchsocket/` state directory, wires up your SvelteKit hooks and Vite config, and generates `.mcp.json` for Claude Code.
 ### 2. Configure
 Minimal config (`searchsocket.config.ts`):
 ```ts
-export default {
-  embeddings: { apiKeyEnv: "JINA_API_KEY" }
-};
+export default {};
 ```
-**That's it!** Turso defaults work out of the box:
-- **Development**: Uses local file DB at `.searchsocket/vectors.db`
-- **Production**: Set `TURSO_DATABASE_URL` and `TURSO_AUTH_TOKEN` to use remote Turso
+That's it — defaults handle the rest. SearchSocket reads `UPSTASH_VECTOR_REST_URL` and `UPSTASH_VECTOR_REST_TOKEN` from your environment automatically.
-### 3. Add SvelteKit API Hook
+### 3. Set environment variables
-Create or update `src/hooks.server.ts`:
+```bash
+# .env
+UPSTASH_VECTOR_REST_URL=https://...
+UPSTASH_VECTOR_REST_TOKEN=...
+```
+Create an [Upstash Vector index](https://console.upstash.com/vector) with the `bge-large-en-v1.5` embedding model (1024 dimensions). Copy the REST URL and token.
+### 4. Add the SvelteKit hook
+The `init` command does this for you, but if you need to do it manually:
 ```ts
+// src/hooks.server.ts
 import { searchsocketHandle } from "searchsocket/sveltekit";
 export const handle = searchsocketHandle();
 ```
-This exposes `POST /api/search` with automatic scope resolution.
+This exposes `POST /api/search`, `GET /api/search/health`, the MCP endpoint at `/api/mcp`, and page retrieval routes.
+If you run into SSR bundling issues, mark SearchSocket as external in your Vite config:
+```ts
+// vite.config.ts
+export default defineConfig({
+  plugins: [sveltekit()],
+  ssr: {
+    external: ["searchsocket", "searchsocket/sveltekit", "searchsocket/client"]
+  }
+});
+```
-### 4. Set Environment Variables
+### 5. Add search to your frontend
-The CLI automatically loads `.env` from the working directory on startup, so your existing `.env` file works out of the box — no wrapper scripts or shell exports needed.
+Copy the search dialog template into your project:
-Development (`.env`):
 ```bash
-JINA_API_KEY=jina_...
+pnpm searchsocket add search-dialog
 ```
-Production (add these for remote Turso):
-```bash
-JINA_API_KEY=jina_...
-TURSO_DATABASE_URL=libsql://your-db.turso.io
-TURSO_AUTH_TOKEN=eyJ...
+This copies a Svelte 5 component to `src/lib/components/search/SearchDialog.svelte` with Cmd+K built in. Import it in your layout and add the scroll-to-text handler:
+```svelte
+<!-- src/routes/+layout.svelte -->
+<script>
+  import { afterNavigate } from "$app/navigation";
+  import { searchsocketScrollToText } from "searchsocket/sveltekit";
+  import SearchDialog from "$lib/components/search/SearchDialog.svelte";
+  afterNavigate(searchsocketScrollToText);
+</script>
+<SearchDialog />
+<slot />
 ```
-### 5. Index Your Content
+Users can now press Cmd+K to search. See [Building a Search UI](docs/search-ui.md) for scoped search, custom styling, and more patterns.
+### 6. Deploy
+SearchSocket is designed to index automatically on deploy. The `init` command already added the Vite plugin to your config. Set these environment variables on your hosting platform (Vercel, Cloudflare, etc.):
+| Variable | Value |
+|----------|-------|
+| `UPSTASH_VECTOR_REST_URL` | Your Upstash Vector REST URL |
+| `UPSTASH_VECTOR_REST_TOKEN` | Your Upstash Vector REST token |
+| `SEARCHSOCKET_AUTO_INDEX` | `1` |
+Every deploy will build your site, index the content, and serve the search API — fully automated.
+For local testing, you can also build and index manually:
 ```bash
-pnpm searchsocket index --changed-only
+pnpm build
+pnpm searchsocket index
+```
+### 7. Connect Claude Code (optional)
+Point Claude Code at your deployed site's MCP endpoint:
+```json
+{
+  "mcpServers": {
+    "searchsocket": {
+      "type": "http",
+      "url": "https://your-site.com/api/mcp"
+    }
+  }
+}
 ```
-SearchSocket auto-detects the source mode based on your config:
-- **`static-output`** (default): Reads prerendered HTML from `build/`
-- **`build`**: Discovers routes from SvelteKit build manifest and renders via preview server
-- **`crawl`**: Fetches pages from a running HTTP server
-- **`content-files`**: Reads markdown/svelte source files directly
+See [MCP Server](#mcp-server) for authentication and other options.
-The indexing pipeline:
-- Extracts content from `<main>` (configurable), including `<meta>` description and keywords
-- Chunks text with semantic heading boundaries
-- Prepends page title to each chunk for embedding context
-- Generates a synthetic summary chunk per page for identity matching
-- Generates embeddings via Jina AI (with task-specific LoRA adapters for indexing vs search)
-- Stores vectors in Turso/libSQL with cosine similarity index
+### Querying the API directly
-### 6. Query
+The search API is also available via HTTP and CLI:
-**Via API:**
 ```bash
+# cURL
 curl -X POST http://localhost:5173/api/search \
   -H "content-type: application/json" \
   -d '{"q":"getting started","topK":5,"groupBy":"page"}'
-```
-**Via client library:**
-```ts
-import { createSearchClient } from "searchsocket/client";
-const client = createSearchClient(); // defaults to /api/search
-const response = await client.search({
-  q: "getting started",
-  topK: 5,
-  groupBy: "page",
-  pathPrefix: "/docs"
-});
+# CLI
+pnpm searchsocket search --q "getting started" --top-k 5
 ```
-**Via CLI:**
-```bash
-pnpm searchsocket search --q "getting started" --top-k 5 --path-prefix /docs
-```
+### Response format
+With `groupBy: "page"` (the default):
-**Response** (with `groupBy: "page"`, the default):
 ```json
 {
   "q": "getting started",
@@ -161,18 +205,16 @@ pnpm searchsocket search --q "getting started" --top-k 5 --path-prefix /docs
     }
   ],
   "meta": {
-    "timingsMs": { "embed": 120, "vector": 15, "rerank": 0, "total": 135 },
-    "usedRerank": false,
-    "modelId": "jina-embeddings-v5-text-small"
+    "timingsMs": { "total": 135 }
   }
 }
 ```
-The `chunks` array appears when a page has multiple matching chunks above the `minChunkScoreRatio` threshold. Use `groupBy: "chunk"` for flat per-chunk results without page aggregation.
+The `chunks` array contains matching sections within each page. Use `groupBy: "chunk"` for flat per-chunk results without page aggregation.
 ## Source Modes
-SearchSocket supports four source modes for loading pages to index.
+SearchSocket supports four ways to load your site content for indexing.
 ### `static-output` (default)
@@ -182,50 +224,37 @@ Reads prerendered HTML files from SvelteKit's build output directory.
 export default {
   source: {
     mode: "static-output",
-    staticOutputDir: "build"
+    staticOutputDir: "build"   // default
   }
 };
 ```
-Best for: Sites with fully prerendered pages. Run `vite build` first, then index.
+Best for fully prerendered sites. Run `vite build` first, then `searchsocket index`.
 ### `build`
-Discovers routes automatically from SvelteKit's build manifest and renders them via an ephemeral `vite preview` server. No manual route configuration needed.
+Discovers routes from SvelteKit's build manifest and renders via an ephemeral `vite preview` server. No manual route lists needed.
 ```ts
 export default {
   source: {
+    mode: "build",
     build: {
-      outputDir: ".svelte-kit/output",   // default
-      previewTimeout: 30000,             // ms to wait for server (default)
-      exclude: ["/api/*", "/admin/*"],   // glob patterns to skip
-      paramValues: {                     // values for dynamic routes
+      exclude: ["/api/*", "/admin/*"],
+      paramValues: {
         "/blog/[slug]": ["hello-world", "getting-started"],
         "/docs/[category]/[page]": ["guides/quickstart", "api/search"]
       },
-      discover: true,                    // crawl internal links to find pages (default: false)
-      seedUrls: ["/"],                   // starting URLs for discovery
-      maxPages: 200,                     // max pages to discover (default: 200)
-      maxDepth: 5                        // max link depth from seed URLs (default: 5)
+      discover: true,        // crawl internal links to find more pages
+      seedUrls: ["/"],
+      maxPages: 200,
+      maxDepth: 5
     }
   }
 };
 ```
-Best for: CI/CD pipelines. Enables `vite build && searchsocket index` with zero route configuration.
-**How it works**:
-1. Parses `.svelte-kit/output/server/manifest-full.js` to discover all page routes
-2. Expands dynamic routes using `paramValues` (skips dynamic routes without values)
-3. Starts an ephemeral `vite preview` server on a random port
-4. Fetches all routes concurrently for SSR-rendered HTML
-5. Provides exact route-to-file mapping (no heuristic matching needed)
-6. Shuts down the preview server
-**Dynamic routes**: Each key in `paramValues` maps to a route ID (e.g., `/blog/[slug]`) or its URL equivalent. Each value in the array replaces all `[param]` segments in the URL. Routes with layout groups like `/(app)/blog/[slug]` also match the URL key `/blog/[slug]`.
-**Link discovery**: Enable `discover: true` to automatically find pages by crawling internal links from `seedUrls`. This is useful when dynamic routes have many parameter values that are impractical to enumerate. The crawler respects `maxPages` and `maxDepth` limits and only follows links within the same origin.
+Best for CI/CD pipelines: `vite build && searchsocket index` with zero route configuration.
 ### `crawl`
@@ -234,24 +263,24 @@ Fetches pages from a running HTTP server.
 ```ts
 export default {
   source: {
+    mode: "crawl",
     crawl: {
       baseUrl: "http://localhost:4173",
-      routes: ["/", "/docs", "/blog"],  // explicit routes
-      sitemapUrl: "https://example.com/sitemap.xml"  // or discover via sitemap
+      routes: ["/", "/docs", "/blog"],
+      sitemapUrl: "https://example.com/sitemap.xml"
     }
   }
 };
 ```
-If `routes` is omitted and no `sitemapUrl` is set, defaults to crawling `["/"]` only.
 ### `content-files`
-Reads markdown and svelte source files directly, without building or serving.
+Reads markdown and Svelte source files directly, without building or serving.
 ```ts
 export default {
   source: {
+    mode: "content-files",
     contentFiles: {
       globs: ["src/routes/**/*.md", "content/**/*.md"],
       baseDir: "."
@@ -262,541 +291,764 @@ export default {
 ## Client Library
-SearchSocket exports a lightweight client for browser-side search:
+### `createSearchClient(options?)`
+Lightweight browser-side search client.
 ```ts
 import { createSearchClient } from "searchsocket/client";
 const client = createSearchClient({
-  endpoint: "/api/search",  // default
-  fetchImpl: fetch           // default; override for SSR or testing
+  endpoint: "/api/search",   // default
+  fetchImpl: fetch            // override for SSR or testing
 });
-const response = await client.search({
+const { results } = await client.search({
   q: "deployment guide",
   topK: 8,
   groupBy: "page",
   pathPrefix: "/docs",
   tags: ["guide"],
-  rerank: true
+  filters: { version: 2 },
+  maxSubResults: 3
 });
-for (const result of response.results) {
-  console.log(result.url, result.title, result.score);
-  if (result.chunks) {
-    for (const chunk of result.chunks) {
-      console.log("  ", chunk.sectionTitle, chunk.score);
-    }
-  }
-}
 ```
-## Vector Backend: Turso/libSQL
-SearchSocket uses **Turso** (libSQL) as its single vector backend, providing a unified experience across development and production.
-### Local Development
-By default, SearchSocket uses a **local file database**:
-- Path: `.searchsocket/vectors.db` (configurable)
-- No account or API keys needed
-- Full vector search with `libsql_vector_idx` and `vector_top_k`
-- Perfect for local development and CI testing
-### Production (Remote Turso)
-For production, switch to **Turso's hosted service**:
-1. **Sign up for Turso** (free tier available):
-   ```bash
-   # Install Turso CLI
-   brew install tursodatabase/tap/turso
-   # Sign up
-   turso auth signup
+### `buildResultUrl(result)`
-   # Create a database
-   turso db create searchsocket-prod
+Builds a URL from a search result that includes scroll-to-text metadata:
-   # Get credentials
-   turso db show searchsocket-prod --url
-   turso db tokens create searchsocket-prod
-   ```
-2. **Set environment variables**:
-   ```bash
-   TURSO_DATABASE_URL=libsql://searchsocket-prod-xxx.turso.io
-   TURSO_AUTH_TOKEN=eyJhbGc...
-   ```
-3. **Index normally** — SearchSocket auto-detects the remote URL and uses it.
-### Direct Credential Passing
-Instead of environment variables, you can pass credentials directly in the config. This is useful for serverless deployments or multi-tenant setups:
+- `_ssk` query parameter — section title for SvelteKit client-side navigation
+- `_sskt` query parameter — text target snippet for precise scroll
+- `#:~:text=` — [Text Fragment](https://developer.mozilla.org/en-US/docs/Web/URI/Fragment/Text_fragments) for native browser scroll on full page loads
 ```ts
-export default {
-  embeddings: {
-    apiKey: "jina_..."  // direct API key (takes precedence over apiKeyEnv)
-  },
-  vector: {
-    turso: {
-      url: "libsql://my-db.turso.io",       // direct URL
-      authToken: "eyJhbGc..."               // direct auth token
-    }
-  }
-};
-```
+import { buildResultUrl } from "searchsocket/client";
-Direct values take precedence over environment variable lookups (`apiKeyEnv`, `urlEnv`, `authTokenEnv`).
-### Dimension Mismatch Auto-Recovery
-When switching embedding models (e.g., from a 1536-dim model to Jina's 1024-dim), the vector dimension changes. SearchSocket automatically detects this and recreates the chunks table with the new dimension — no manual intervention needed. A full re-index (`--force`) is still required after switching models.
+const href = buildResultUrl(result);
+// "/docs/getting-started?_ssk=Installation&_sskt=Install+with+pnpm#:~:text=Install%20with%20pnpm"
+```
-### Why Turso?
+## Svelte 5 Integration
+### `createSearch(options?)`
+A reactive search store built on Svelte 5 runes with debouncing and LRU caching.
+```svelte
+<script>
+  import { createSearch } from "searchsocket/svelte";
+  import { buildResultUrl } from "searchsocket/client";
+  const search = createSearch({
+    endpoint: "/api/search",
+    debounce: 250,            // ms (default)
+    cache: true,              // LRU result caching (default)
+    cacheSize: 50,            // max cached queries (default)
+    topK: 10,
+    groupBy: "page",
+    pathPrefix: "/docs"       // scope search to a section
+  });
+</script>
+<input bind:value={search.query} placeholder="Search docs..." />
+{#if search.loading}
+  <p>Searching...</p>
+{/if}
+{#if search.error}
+  <p class="error">{search.error.message}</p>
+{/if}
+{#each search.results as result}
+  <a href={buildResultUrl(result)}>
+    <strong>{result.title}</strong>
+    {#if result.sectionTitle}
+      <span>— {result.sectionTitle}</span>
+    {/if}
+  </a>
+  <p>{result.snippet}</p>
+{/each}
+```
-- **Single backend** — one unified Turso/libSQL store for vectors, metadata, and state
-- **Local-first development** — zero external dependencies for local dev
-- **Production-ready** — same codebase scales to remote hosted DB
-- **Cost-effective** — Turso free tier includes 9GB storage, 500M row reads/month
-- **Vector search native** — `F32_BLOB` vectors, cosine similarity index, `vector_top_k` ANN queries
+Call `search.destroy()` to clean up when no longer needed (automatic in component context).
-## Serverless Deployment (Vercel, Netlify, etc.)
+### `<SearchSocket>` component
-SearchSocket works on serverless platforms with a few adjustments:
+Declarative meta tag component for controlling per-page search behavior:
-### Requirements
+```svelte
+<script>
+  import { SearchSocket } from "searchsocket/svelte";
+</script>
-1. **Remote Turso database** — local SQLite is not available in serverless (no persistent filesystem). Set `TURSO_DATABASE_URL` and `TURSO_AUTH_TOKEN` as platform environment variables.
+<!-- Boost this page's search ranking -->
+<SearchSocket weight={1.2} />
-2. **Inline config via `rawConfig`** — the default config loader uses `jiti` to import `searchsocket.config.ts` from disk, which isn't bundled in serverless. Use `rawConfig` to pass config inline:
+<!-- Exclude from search -->
+<SearchSocket noindex />
-```ts
-// hooks.server.ts (Vercel / Netlify)
-import { searchsocketHandle } from "searchsocket/sveltekit";
+<!-- Add filterable tags -->
+<SearchSocket tags={["guide", "advanced"]} />
-export const handle = searchsocketHandle({
-  rawConfig: {
-    project: { id: "my-docs-site" },
-    source: { mode: "static-output" },
-    embeddings: { apiKeyEnv: "JINA_API_KEY" },
-  }
-});
+<!-- Add structured metadata (filterable via search API) -->
+<SearchSocket meta={{ version: 2, category: "api" }} />
 ```
-3. **Environment variables** — set these on your platform dashboard:
-   - `JINA_API_KEY`
-   - `TURSO_DATABASE_URL`
-   - `TURSO_AUTH_TOKEN`
+The component renders `<meta>` tags in `<svelte:head>` that SearchSocket reads during indexing.
-### Rate Limiting
+### Template components
-The built-in `InMemoryRateLimiter` auto-disables on serverless platforms (it resets on every cold start). Use your platform's WAF or edge rate-limiting instead.
+Copy ready-made search UI components into your project:
-### What Only Applies to Indexing
+```bash
+pnpm searchsocket add search-dialog
+pnpm searchsocket add search-input
+pnpm searchsocket add search-results
+```
-The following features are only used during `searchsocket index` (CLI), not the search handler:
-- `ensureStateDirs` — creates `.searchsocket/` state directories
-- Markdown mirror — writes `.searchsocket/mirror/` files
-- Local SQLite fallback — only needed when `TURSO_DATABASE_URL` is not set
+These are Svelte 5 components copied to `src/lib/components/search/` (configurable via `--dir`). They're starting points to customize, not dependencies.
-### Adapter Guidance
+## Scroll-to-Text Navigation
-| Platform | Adapter | Notes |
-|----------|---------|-------|
-| Vercel | `adapter-auto` (default) | Serverless — use `rawConfig` + remote Turso |
-| Netlify | `adapter-netlify` | Serverless — same as Vercel |
-| VPS / Docker | `adapter-node` | Long-lived process — no limitations, local SQLite works |
+When a user clicks a search result, SearchSocket scrolls them to the matching section on the destination page.
-## Embeddings: Jina AI
+### Setup
-SearchSocket uses **Jina AI's embedding models** to convert text into semantic vectors. A single `JINA_API_KEY` powers both embeddings and optional reranking.
+Add the scroll handler to your root layout:
-### Default Model
+```svelte
+<!-- src/routes/+layout.svelte -->
+<script>
+  import { afterNavigate } from '$app/navigation';
+  import { searchsocketScrollToText } from 'searchsocket/sveltekit';
-- **Model**: `jina-embeddings-v5-text-small`
-- **Dimensions**: 1024 (default)
-- **Cost**: ~$0.00005 per 1K tokens
-- **Task adapters**: Uses `retrieval.passage` for indexing, `retrieval.query` for search queries (LoRA task-specific adapters for better retrieval quality)
+  afterNavigate(searchsocketScrollToText);
+</script>
+```
-### How It Works
+### How it works
-1. **Chunking**: Text is split into semantic chunks (default 2200 chars, 200 overlap)
-2. **Title Prepend**: Page title is prepended to each chunk for better context (`chunking.prependTitle`, default: true)
-3. **Summary Chunk**: A synthetic identity chunk is generated per page with title, URL, and first paragraph (`chunking.pageSummaryChunk`, default: true)
-4. **Embedding**: Each chunk is sent to Jina's embedding API with the `retrieval.passage` task adapter
-5. **Batching**: Requests batched (64 texts per request) for efficiency
-6. **Storage**: Vectors stored in Turso with metadata (URL, title, tags, depth, etc.)
+1. `buildResultUrl()` encodes the section title and text snippet into the URL
+2. On SvelteKit client-side navigation, the `afterNavigate` hook reads `_ssk`/`_sskt` params
+3. A TreeWalker-based text mapper finds the exact position in the DOM
+4. The page scrolls smoothly to the match
+5. The matching text is highlighted using the [CSS Custom Highlight API](https://developer.mozilla.org/en-US/docs/Web/API/CSS_Custom_Highlight_API) (with a DOM fallback for older browsers)
+6. On full page loads, browsers that support Text Fragments (`#:~:text=`) handle scrolling natively
-### Cost Estimation
+The highlight fades after 2 seconds. Customize with CSS:
-Use `--dry-run` to preview costs:
-```bash
-pnpm searchsocket index --dry-run
+```css
+::highlight(ssk-highlight) {
+  background-color: rgba(250, 204, 21, 0.4);
+}
 ```
-Output:
-```
-pages processed: 42
-chunks total: 156
-chunks changed: 156
-embeddings created: 156
-estimated tokens: 32,400
-estimated cost (USD): $0.000648
-```
+## Search & Ranking
-### Reranking
+### Dual search
-Since embeddings and reranking share the same Jina API key, enabling reranking is one boolean:
+By default, SearchSocket runs two parallel queries — one against page-level summaries and one against individual chunks — then blends the scores:
 ```ts
 export default {
-  embeddings: { apiKeyEnv: "JINA_API_KEY" },
-  rerank: { enabled: true }
+  search: {
+    dualSearch: true,          // default
+    pageSearchWeight: 0.3      // weight of page results vs chunks (0-1)
+  }
 };
 ```
-**Note**: Changing the model after indexing requires re-indexing with `--force`.
-## Search & Ranking
-### Page Aggregation
+### Page aggregation
-By default (`groupBy: "page"`), SearchSocket groups chunk results by page URL and computes a page-level score:
+With `groupBy: "page"` (default), chunk results are grouped by page URL:
 1. The top chunk score becomes the base page score
-2. Additional matching chunks contribute a decaying bonus: `chunk_score * decay^i`
-3. Optional per-URL page weights are applied multiplicatively
+2. Additional matching chunks add a decaying bonus: `chunk_score * decay^i`
+3. Per-URL page weights are applied multiplicatively
-Configure aggregation behavior:
+### Ranking configuration
 ```ts
 export default {
   ranking: {
-    minScore: 0,                // minimum absolute score to include in results (default: 0, disabled)
-    aggregationCap: 5,          // max chunks contributing to page score (default: 5)
-    aggregationDecay: 0.5,      // decay factor for additional chunks (default: 0.5)
-    minChunkScoreRatio: 0.5,    // threshold for sub-chunks in results (default: 0.5)
-    pageWeights: {              // per-URL score multipliers
-      "/": 1.1,
+    enableIncomingLinkBoost: true,    // boost pages with more internal links pointing to them
+    enableDepthBoost: true,           // boost shallower pages (/ > /docs > /docs/api)
+    enableFreshnessBoost: false,      // boost recently published content
+    enableAnchorTextBoost: false,     // boost pages whose link text matches the query
+    pageWeights: {                    // per-URL score multipliers (prefix matching)
+      "/": 0.95,
       "/docs": 1.15,
-      "/download": 1.2
+      "/download": 1.05
     },
+    aggregationCap: 5,               // max chunks contributing to page score
+    aggregationDecay: 0.5,           // decay for additional chunks
+    minScoreRatio: 0.70,             // drop results below 70% of best score
+    scoreGapThreshold: 0.4,          // trim results >40% below best
+    minChunkScoreRatio: 0.5,         // threshold for sub-chunks
     weights: {
-      aggregation: 0.1,        // weight of aggregation bonus (default: 0.1)
-      incomingLinks: 0.05,     // incoming link boost weight (default: 0.05)
-      depth: 0.03,             // URL depth boost weight (default: 0.03)
-      rerank: 1.0              // reranker score weight (default: 1.0)
+      incomingLinks: 0.05,
+      depth: 0.03,
+      aggregation: 0.1,
+      titleMatch: 0.15,
+      freshness: 0.1,
+      anchorText: 0.10
     }
   }
 };
 ```
-`pageWeights` supports exact URL matches and prefix matching. A weight of `1.15` on `"/docs"` boosts all pages under `/docs/` by 15%. Use gentle values (1.05-1.2x) since they compound with aggregation.
-`minScore` filters out low-relevance results before they reach the client. Set to a value like `0.3` to remove noise. In page mode, pages below the threshold are dropped; in chunk mode, individual chunks are filtered. Default is `0` (disabled).
-### Chunk Mode
-Use `groupBy: "chunk"` for flat per-chunk results without page aggregation:
-```bash
-curl -X POST http://localhost:5173/api/search \
-  -H "content-type: application/json" \
-  -d '{"q":"vector search","topK":10,"groupBy":"chunk"}'
-```
+Use gentle `pageWeights` values (0.9–1.2) since they compound with other boosts.
 ## Build-Triggered Indexing
-Automatically index after each SvelteKit build.
+The recommended workflow is to index automatically on every deploy. Add the Vite plugin to your config:
-**`vite.config.ts` or `svelte.config.js`:**
 ```ts
+// vite.config.ts
+import { sveltekit } from "@sveltejs/kit/vite";
 import { searchsocketVitePlugin } from "searchsocket/sveltekit";
 export default {
   plugins: [
-    svelteKitPlugin(),
+    sveltekit(),
     searchsocketVitePlugin({
-      enabled: true,        // or check process.env.SEARCHSOCKET_AUTO_INDEX
-      changedOnly: true,    // incremental indexing (faster)
-      verbose: false
+      changedOnly: true,    // incremental indexing (default)
+      verbose: true
     })
   ]
 };
 ```
-**Environment control:**
+### Vercel / Cloudflare / Netlify
+Set these environment variables in your hosting platform:
+| Variable | Value |
+|----------|-------|
+| `UPSTASH_VECTOR_REST_URL` | Your Upstash Vector REST URL |
+| `UPSTASH_VECTOR_REST_TOKEN` | Your Upstash Vector REST token |
+| `SEARCHSOCKET_AUTO_INDEX` | `1` |
+Every deploy will build your site, index the content into Upstash, and serve the search API and MCP endpoint — fully automated.
+### Environment variable control
 ```bash
-# Enable via env var
+# Enable indexing on build
 SEARCHSOCKET_AUTO_INDEX=1 pnpm build
-# Disable via env var
+# Disable temporarily
 SEARCHSOCKET_DISABLE_AUTO_INDEX=1 pnpm build
+# Force full rebuild (ignore incremental cache)
+SEARCHSOCKET_FORCE_REINDEX=1 pnpm build
+```
+## Making Images Searchable
+SearchSocket converts images to text during extraction using this priority chain:
+1. `data-search-description` on the `<img>` — your explicit description
+2. `data-search-description` on the parent `<figure>`
+3. `alt` text + `<figcaption>` combined
+4. `alt` text alone (filters generic words like "image", "icon")
+5. `<figcaption>` alone
+6. Removed — images with no useful text are dropped
+```html
+<img
+  src="/screenshots/settings.png"
+  alt="Settings page"
+  data-search-description="The settings page showing API key configuration, theme selection, and notification preferences"
+/>
+```
+Works with SvelteKit's `enhanced:img`:
+```svelte
+<enhanced:img
+  src="./screenshots/dashboard.png"
+  alt="Dashboard"
+  data-search-description="Main dashboard showing active projects and indexing status"
+/>
+```
+## MCP Server
+SearchSocket includes an MCP server that gives Claude Code, Claude Desktop, and other MCP clients direct access to your site's search index. The MCP endpoint is built into `searchsocketHandle()` — once your site is deployed, any MCP client can connect to it over HTTP.
+### Available tools
+| Tool | Description |
+|------|-------------|
+| `search` | Semantic search with filtering, grouping, and reranking |
+| `get_page` | Retrieve full page markdown with frontmatter |
+| `list_pages` | Cursor-paginated page listing |
+| `get_site_structure` | Hierarchical page tree |
+| `find_source_file` | Locate the SvelteKit source file for content |
+| `get_related_pages` | Find related pages by links, semantics, and structure |
+### Connecting to your deployed site
+The recommended setup is to connect Claude Code to your deployed site's MCP endpoint. This way the index stays up to date automatically as you deploy, and there's no local process to manage.
+Add `.mcp.json` to your project root:
+```json
+{
+  "mcpServers": {
+    "searchsocket": {
+      "type": "http",
+      "url": "https://your-site.com/api/mcp"
+    }
+  }
+}
 ```
-## Git-Tracked Markdown Mirror
+That's it. Restart Claude Code and the six search tools are available. You can search your docs, retrieve page content, and find source files directly from the AI assistant.
-Indexing writes a **deterministic markdown mirror**:
+To protect the endpoint, add API key authentication:
+```ts
+// src/hooks.server.ts
+export const handle = searchsocketHandle({
+  rawConfig: {
+    mcp: {
+      handle: {
+        apiKey: process.env.SEARCHSOCKET_MCP_API_KEY
+      }
+    }
+  }
+});
 ```
-.searchsocket/pages/<scope>/<path>.md
+Then pass the key in `.mcp.json`:
+```json
+{
+  "mcpServers": {
+    "searchsocket": {
+      "type": "http",
+      "url": "https://your-site.com/api/mcp",
+      "headers": {
+        "Authorization": "Bearer ${SEARCHSOCKET_MCP_API_KEY}"
+      }
+    }
+  }
+}
+```
+The `${SEARCHSOCKET_MCP_API_KEY}` syntax references an environment variable so you don't hardcode secrets in `.mcp.json`.
+### Auto-approving in Claude Code
+Skip the approval prompt each time a tool is called:
+```json
+{
+  "allowedMcpServers": [
+    { "serverName": "searchsocket" }
+  ]
+}
 ```
-Example:
+Add this to `.claude/settings.json` in your project.
+### Local development
+During local development, you can point to your dev server instead:
+```json
+{
+  "mcpServers": {
+    "searchsocket": {
+      "type": "http",
+      "url": "http://localhost:5173/api/mcp"
+    }
+  }
+}
 ```
-.searchsocket/pages/main/docs/intro.md
+### Claude Desktop
+Add to `~/Library/Application Support/Claude/claude_desktop_config.json`:
+```json
+{
+  "mcpServers": {
+    "searchsocket": {
+      "command": "npx",
+      "args": ["searchsocket", "mcp"],
+      "cwd": "/path/to/your/project"
+    }
+  }
+}
 ```
-Each file contains:
-- Frontmatter: URL, title, scope, route file, metadata
-- Markdown: Extracted content
+### Standalone HTTP server
-**Why commit it?**
-- Content workflows (edit markdown, regenerate embeddings)
-- Version control for indexed content
-- Debugging (see exactly what was indexed)
-- Offline search (grep the mirror)
+Run the MCP server as a standalone process (outside SvelteKit):
-Add to `.gitignore` if you don't need it:
+```bash
+pnpm searchsocket mcp --transport http --port 3338
 ```
-.searchsocket/pages/
+## llms.txt Generation
+Generate [llms.txt](https://llmstxt.org/) files during indexing — a standardized way to make your site content available to LLMs.
+```ts
+export default {
+  project: {
+    baseUrl: "https://example.com"
+  },
+  llmsTxt: {
+    enable: true,
+    title: "My Project",
+    description: "Documentation for My Project",
+    outputPath: "static/llms.txt",    // default
+    generateFull: true,                // also generate llms-full.txt
+    serveMarkdownVariants: false       // serve /page.md variants via the hook
+  }
+};
 ```
-## Commands
+After indexing, `llms.txt` (page index with links) and `llms-full.txt` (full content) are written to your static directory and served by `searchsocketHandle()`.
+## CLI Commands
 ### `searchsocket init`
-Initialize config and state directory.
+Initialize config and state directory. Creates `searchsocket.config.ts`, `.searchsocket/`, `.mcp.json`, and wires up your hooks and Vite config.
 ```bash
 pnpm searchsocket init
+pnpm searchsocket init --non-interactive
 ```
 ### `searchsocket index`
-Index content into vectors.
+Index content into Upstash Vector.
 ```bash
-# Incremental (only changed chunks)
-pnpm searchsocket index --changed-only
+pnpm searchsocket index                    # incremental (default: --changed-only)
+pnpm searchsocket index --force            # full re-index
+pnpm searchsocket index --source build     # override source mode
+pnpm searchsocket index --scope staging    # override scope
+pnpm searchsocket index --dry-run          # preview without writing
+pnpm searchsocket index --max-pages 10     # limit for testing
+pnpm searchsocket index --verbose          # detailed output
+pnpm searchsocket index --json             # machine-readable output
+```
-# Full re-index
-pnpm searchsocket index --force
+### `searchsocket search`
-# Preview cost without indexing
-pnpm searchsocket index --dry-run
+CLI search for testing.
-# Override source mode
-pnpm searchsocket index --source build
+```bash
+pnpm searchsocket search --q "getting started" --top-k 5
+pnpm searchsocket search --q "api" --path-prefix /docs
+```
-# Limit for testing
-pnpm searchsocket index --max-pages 10 --max-chunks 50
+### `searchsocket dev`
-# Override scope
-pnpm searchsocket index --scope staging
+Watch for file changes and auto-reindex, with optional playground UI.
-# Verbose output
-pnpm searchsocket index --verbose
+```bash
+pnpm searchsocket dev                                # watch + playground at :3337
+pnpm searchsocket dev --mcp --mcp-port 3338          # also start MCP HTTP server
+pnpm searchsocket dev --no-playground                 # watch only
 ```
 ### `searchsocket status`
-Show indexing status, scope, and vector health.
+Show indexing status and backend health.
 ```bash
 pnpm searchsocket status
+```
+### `searchsocket doctor`
+Validate config, env vars, provider connectivity, and write access.
-# Output:
-# project: my-site
-# resolved scope: main
-# embedding model: jina-embeddings-v5-text-small
-# vector backend: turso/libsql (local (.searchsocket/vectors.db))
-# vector health: ok
-# last indexed (main): 2025-02-23T10:30:00Z
-# tracked chunks: 156
-# last estimated tokens: 32,400
-# last estimated cost: $0.000648
+```bash
+pnpm searchsocket doctor
 ```
-### `searchsocket dev`
+### `searchsocket test`
-Watch for file changes and auto-reindex.
+Run search quality assertions against the live index.
 ```bash
-pnpm searchsocket dev
+pnpm searchsocket test                              # uses searchsocket.test.json
+pnpm searchsocket test --file custom-tests.json     # custom test file
+```
+Test file format:
-# With MCP server
-pnpm searchsocket dev --mcp --mcp-port 3338
+```json
+[
+  {
+    "query": "installation guide",
+    "expect": {
+      "topResult": "/docs/getting-started",
+      "inTop5": ["/docs/getting-started", "/docs/quickstart"]
+    }
+  }
+]
 ```
-Watches:
-- `src/routes/**` (route files)
-- `build/` (if static-output mode)
-- Build output dir (if build mode)
-- Content files (if content-files mode)
-- `searchsocket.config.ts` (if crawl or build mode)
+Reports pass/fail per assertion and Mean Reciprocal Rank (MRR) across all queries.
 ### `searchsocket clean`
-Delete local state and optionally remote vectors.
+Delete local state and optionally remote indexes.
 ```bash
-# Local state only
-pnpm searchsocket clean
-# Local + remote vectors
-pnpm searchsocket clean --remote --scope staging
+pnpm searchsocket clean                    # local state only
+pnpm searchsocket clean --remote           # also delete remote scope
+pnpm searchsocket clean --scope staging    # specific scope
 ```
 ### `searchsocket prune`
-Delete stale scopes (e.g., deleted git branches).
+List and delete stale scopes. Compares against git branches to find orphaned scopes.
 ```bash
-# Dry run (shows what would be deleted)
-pnpm searchsocket prune --older-than 30d
+pnpm searchsocket prune                       # dry-run (default)
+pnpm searchsocket prune --apply               # actually delete
+pnpm searchsocket prune --older-than 30d      # only scopes older than 30 days
+```
-# Apply deletions
-pnpm searchsocket prune --older-than 30d --apply
+### `searchsocket mcp`
+Run the MCP server standalone.
-# Use custom scope list
-pnpm searchsocket prune --scopes-file active-branches.txt --apply
+```bash
+pnpm searchsocket mcp                                   # stdio (default)
+pnpm searchsocket mcp --transport http --port 3338       # HTTP
+pnpm searchsocket mcp --access public --api-key SECRET   # public with auth
 ```
-### `searchsocket doctor`
+### `searchsocket add`
-Validate config, env vars, and connectivity.
+Copy Svelte 5 search UI template components into your project.
 ```bash
-pnpm searchsocket doctor
-# Output:
-# PASS config parse
-# PASS env JINA_API_KEY
-# PASS turso/libsql (local file: .searchsocket/vectors.db)
-# PASS source: build manifest
-# PASS source: vite binary
-# PASS embedding provider connectivity
-# PASS vector backend connectivity
-# PASS vector backend write permission
-# PASS state directory writable
+pnpm searchsocket add search-dialog
+pnpm searchsocket add search-input
+pnpm searchsocket add search-results
+pnpm searchsocket add search-dialog --dir src/lib/components/ui  # custom dir
 ```
-### `searchsocket mcp`
+## Real-World Example
-Run MCP server for Claude Desktop / other MCP clients.
+Here's how [Canopy](https://canopy.dev) integrates SearchSocket into a production SvelteKit site.
-```bash
-# stdio transport (default)
-pnpm searchsocket mcp
+### Configuration
-# HTTP transport
-pnpm searchsocket mcp --transport http --port 3338
+```ts
+// searchsocket.config.ts
+export default {
+  project: {
+    id: "canopy-website",
+    baseUrl: "https://canopy.dev"
+  },
+  source: {
+    mode: "build"
+  },
+  extract: {
+    dropSelectors: [".nav-blur", ".mobile-overlay", ".docs-sidebar"]
+  },
+  ranking: {
+    minScoreRatio: 0.70,
+    pageWeights: {
+      "/": 0.95,
+      "/download": 1.05,
+      "/docs/**": 1.05
+    },
+    aggregationCap: 3,
+    aggregationDecay: 0.3
+  }
+};
 ```
-### `searchsocket search`
+### Server hook
-CLI search for testing.
+```ts
+// src/hooks.server.ts
+import { searchsocketHandle } from "searchsocket/sveltekit";
+import { env } from "$env/dynamic/private";
-```bash
-pnpm searchsocket search --q "turso vector search" --top-k 5 --rerank
+export const handle = searchsocketHandle({
+  rawConfig: {
+    project: { id: "canopy-website", baseUrl: "https://canopy.dev" },
+    source: { mode: "build" },
+    upstash: {
+      url: env.UPSTASH_VECTOR_REST_URL,
+      token: env.UPSTASH_VECTOR_REST_TOKEN
+    },
+    extract: {
+      dropSelectors: [".nav-blur", ".mobile-overlay", ".docs-sidebar"]
+    },
+    ranking: {
+      minScoreRatio: 0.70,
+      pageWeights: { "/": 0.95, "/download": 1.05, "/docs/**": 1.05 },
+      aggregationCap: 3,
+      aggregationDecay: 0.3
+    }
+  }
+});
+```
+### Search modal with scoped search
+```svelte
+<!-- SearchModal.svelte -->
+<script>
+  import { createSearchClient, buildResultUrl } from "searchsocket/client";
+  let { open = $bindable(false), pathPrefix = "", placeholder = "Search..." } = $props();
+  const client = createSearchClient();
+  let query = $state("");
+  let results = $state([]);
+  async function doSearch() {
+    if (!query.trim()) { results = []; return; }
+    const res = await client.search({
+      q: query,
+      topK: 8,
+      groupBy: "page",
+      pathPrefix: pathPrefix || undefined
+    });
+    results = res.results;
+  }
+</script>
+{#if open}
+  <dialog open>
+    <input bind:value={query} oninput={doSearch} {placeholder} />
+    {#each results as result}
+      <a href={buildResultUrl(result)} onclick={() => open = false}>
+        <strong>{result.title}</strong>
+        {#if result.sectionTitle}<span>— {result.sectionTitle}</span>{/if}
+        <p>{result.snippet}</p>
+      </a>
+    {/each}
+  </dialog>
+{/if}
 ```
-## MCP (Model Context Protocol)
+### Scroll-to-text in layout
-SearchSocket provides an **MCP server** for integration with Claude Code, Claude Desktop, and other MCP-compatible AI tools. This gives AI assistants direct access to your indexed site content for semantic search and page retrieval.
+```svelte
+<!-- src/routes/+layout.svelte -->
+<script>
+  import { afterNavigate } from "$app/navigation";
+  import { searchsocketScrollToText } from "searchsocket/sveltekit";
-### Tools
+  afterNavigate(searchsocketScrollToText);
+</script>
+```
-**`search(query, opts?)`**
-- Semantic search across indexed content
-- Returns ranked results with URL, title, snippet, score, and routeFile
-- Options: `scope`, `topK` (1-100), `pathPrefix`, `tags`, `groupBy` (`"page"` | `"chunk"`)
+### Deploy and index
-**`get_page(pathOrUrl, opts?)`**
-- Retrieve full indexed page content as markdown with frontmatter
-- Options: `scope`
+Indexing runs automatically on every Vercel deploy. Set these env vars in the Vercel dashboard:
-### Setup (Claude Code)
+- `UPSTASH_VECTOR_REST_URL`
+- `UPSTASH_VECTOR_REST_TOKEN`
+- `SEARCHSOCKET_AUTO_INDEX=1`
-Add a `.mcp.json` file to your project root (safe to commit — no secrets needed since the CLI auto-loads `.env`):
+The Vite plugin handles the rest. Alternatively, use a postbuild script:
 ```json
 {
-  "mcpServers": {
-    "searchsocket": {
-      "type": "stdio",
-      "command": "npx",
-      "args": ["searchsocket", "mcp"],
-      "env": {}
-    }
+  "scripts": {
+    "build": "vite build",
+    "postbuild": "searchsocket index"
   }
 }
 ```
-Restart Claude Code. The `search` and `get_page` tools will be available automatically. Verify with:
-```bash
-claude mcp list
-```
-### Setup (Claude Desktop)
-Add to `~/Library/Application Support/Claude/claude_desktop_config.json`:
+### Connect Claude Code to the deployed site
 ```json
 {
   "mcpServers": {
     "searchsocket": {
-      "command": "npx",
-      "args": ["searchsocket", "mcp"],
-      "cwd": "/path/to/your/project"
+      "type": "http",
+      "url": "https://canopy.dev/api/mcp"
     }
   }
 }
 ```
-Restart Claude Desktop. The tools appear in the MCP menu.
+Now Claude Code can search the live docs, retrieve page content, and find source files — all backed by the production index that stays current with every deploy.
-### HTTP Transport
+### Excluding pages from search
-For non-stdio clients, run the MCP server over HTTP:
-```bash
-npx searchsocket mcp --transport http --port 3338
+```svelte
+<!-- src/routes/blog/+page.svelte (archive page) -->
+<svelte:head>
+  <meta name="searchsocket-weight" content="0" />
+</svelte:head>
 ```
-This starts a stateless server at `http://127.0.0.1:3338/mcp`. Each POST request creates a fresh server instance with no session persistence.
+Or with the component:
-## Environment Variables
+```svelte
+<script>
+  import { SearchSocket } from "searchsocket/svelte";
+</script>
-The CLI automatically loads `.env` from the working directory on startup. Existing `process.env` values take precedence over `.env` file values. This only applies to CLI commands (`searchsocket index`, `searchsocket mcp`, etc.) — library imports like `searchsocketHandle()` rely on your framework's own `.env` handling (Vite/SvelteKit).
+<SearchSocket weight={0} />
+```
-### Required
+### Vite SSR config
+```ts
+// vite.config.ts
+import { sveltekit } from "@sveltejs/kit/vite";
+import { defineConfig } from "vite";
+export default defineConfig({
+  plugins: [sveltekit()],
+  ssr: {
+    external: ["searchsocket", "searchsocket/sveltekit", "searchsocket/client"]
+  }
+});
+```
-**Jina AI:**
-- `JINA_API_KEY` — Jina AI API key for embeddings and reranking
+## Environment Variables
-### Optional (Turso)
+### Required
-**Remote Turso (production):**
-- `TURSO_DATABASE_URL` — Turso database URL (e.g., `libsql://my-db.turso.io`)
-- `TURSO_AUTH_TOKEN` — Turso auth token
+| Variable | Description |
+|----------|-------------|
+| `UPSTASH_VECTOR_REST_URL` | Upstash Vector REST API endpoint |
+| `UPSTASH_VECTOR_REST_TOKEN` | Upstash Vector REST API token |
-If not set, uses local file DB at `.searchsocket/vectors.db`.
+### Optional
-### Optional (Scope/Build)
+| Variable | Description |
+|----------|-------------|
+| `SEARCHSOCKET_SCOPE` | Override scope (when `scope.mode: "env"`) |
+| `SEARCHSOCKET_AUTO_INDEX` | Enable build-triggered indexing (`1`, `true`, or `yes`) |
+| `SEARCHSOCKET_DISABLE_AUTO_INDEX` | Disable build-triggered indexing |
+| `SEARCHSOCKET_FORCE_REINDEX` | Force full re-index in CI/CD |
-- `SEARCHSOCKET_SCOPE` — Override scope (when `scope.mode: "env"`)
-- `SEARCHSOCKET_AUTO_INDEX` — Enable build-triggered indexing
-- `SEARCHSOCKET_DISABLE_AUTO_INDEX` — Disable build-triggered indexing
+The CLI automatically loads `.env` from the working directory on startup.
-## Configuration
+## Configuration Reference
-### Full Example
+See [docs/config.md](docs/config.md) for the full configuration reference. Here's the full example:
 ```ts
 export default {
@@ -806,41 +1058,24 @@ export default {
   },
   scope: {
-    mode: "git",           // "fixed" | "git" | "env"
+    mode: "git",                 // "fixed" | "git" | "env"
     fixed: "main",
     sanitize: true
   },
+  exclude: ["/admin/*", "/api/*"],
+  respectRobotsTxt: true,
   source: {
-    mode: "build",         // "static-output" | "crawl" | "content-files" | "build"
+    mode: "build",
     staticOutputDir: "build",
-    strictRouteMapping: false,
-    // Build mode (recommended for CI/CD)
     build: {
-      outputDir: ".svelte-kit/output",
-      previewTimeout: 30000,
       exclude: ["/api/*"],
       paramValues: {
         "/blog/[slug]": ["hello-world", "getting-started"]
       },
-      discover: false,
-      seedUrls: ["/"],
-      maxPages: 200,
-      maxDepth: 5
-    },
-    // Crawl mode (alternative)
-    crawl: {
-      baseUrl: "http://localhost:4173",
-      routes: ["/", "/docs", "/blog"],
-      sitemapUrl: "https://example.com/sitemap.xml"
-    },
-    // Content files mode (alternative)
-    contentFiles: {
-      globs: ["src/routes/**/*.md"],
-      baseDir: "."
+      discover: true,
+      maxPages: 200
     }
   },
@@ -850,77 +1085,77 @@ export default {
     dropSelectors: [".sidebar", ".toc"],
     ignoreAttr: "data-search-ignore",
     noindexAttr: "data-search-noindex",
-    respectRobotsNoindex: true
+    imageDescAttr: "data-search-description"
   },
   chunking: {
-    maxChars: 2200,
+    maxChars: 1500,
     overlapChars: 200,
     minChars: 250,
-    headingPathDepth: 3,
-    dontSplitInside: ["code", "table", "blockquote"],
-    prependTitle: true,       // prepend page title to chunk text before embedding
-    pageSummaryChunk: true    // generate synthetic identity chunk per page
-  },
-  embeddings: {
-    provider: "jina",
-    model: "jina-embeddings-v5-text-small",
-    apiKey: "jina_...",          // direct API key (or use apiKeyEnv)
-    apiKeyEnv: "JINA_API_KEY",
-    batchSize: 64,
-    concurrency: 4
+    prependTitle: true,
+    pageSummaryChunk: true
   },
-  vector: {
-    dimension: 1024,  // optional, inferred from first embedding
-    turso: {
-      url: "libsql://my-db.turso.io",    // direct URL (or use urlEnv)
-      authToken: "eyJhbGc...",            // direct token (or use authTokenEnv)
-      urlEnv: "TURSO_DATABASE_URL",
-      authTokenEnv: "TURSO_AUTH_TOKEN",
-      localPath: ".searchsocket/vectors.db"
-    }
+  upstash: {
+    urlEnv: "UPSTASH_VECTOR_REST_URL",
+    tokenEnv: "UPSTASH_VECTOR_REST_TOKEN"
   },
-  rerank: {
-    enabled: true,
-    topN: 20,
-    model: "jina-reranker-v3"
+  search: {
+    dualSearch: true,
+    pageSearchWeight: 0.3
   },
   ranking: {
     enableIncomingLinkBoost: true,
     enableDepthBoost: true,
-    pageWeights: {
-      "/": 1.1,
-      "/docs": 1.15
-    },
-    minScore: 0,
+    pageWeights: { "/docs": 1.15 },
+    minScoreRatio: 0.70,
     aggregationCap: 5,
-    aggregationDecay: 0.5,
-    minChunkScoreRatio: 0.5,
-    weights: {
-      incomingLinks: 0.05,
-      depth: 0.03,
-      rerank: 1.0,
-      aggregation: 0.1
-    }
+    aggregationDecay: 0.5
   },
   api: {
     path: "/api/search",
-    cors: {
-      allowOrigins: ["https://example.com"]
-    },
-    rateLimit: {
-      windowMs: 60_000,
-      max: 60
-    }
+    cors: { allowOrigins: ["https://example.com"] }
+  },
+  mcp: {
+    enable: true,
+    handle: { path: "/api/mcp" }
+  },
+  llmsTxt: {
+    enable: true,
+    title: "My Project",
+    description: "Documentation for My Project"
+  },
+  state: {
+    dir: ".searchsocket"
   }
 };
 ```
+## CI/CD
+See [docs/ci.md](docs/ci.md) for ready-to-use GitHub Actions workflows covering:
+- Main branch indexing on push
+- PR dry-run validation
+- Preview branch scope isolation
+- Scheduled scope pruning
+- Vercel build-triggered indexing
+## Further Reading
+- [Building a Search UI](docs/search-ui.md) — Cmd+K modals, scoped search, styling, and API reference
+- [Tuning Search Relevance](docs/tuning.md) — visual playground, ranking parameters, and search quality testing
+- [Configuration Reference](docs/config.md) — all config options, indexing hooks, and custom records
+- [CI/CD Workflows](docs/ci.md) — GitHub Actions and Vercel integration
+- [MCP over HTTP Guide](docs/mcp-claude-code.md) — detailed HTTP MCP setup for Claude Code
+- [Troubleshooting](docs/troubleshooting.md) — common issues, diagnostics, and FAQ
 ## License
 MIT