searchsocket 0.3.3 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -6,17 +6,17 @@ Semantic site search and MCP retrieval for SvelteKit content projects.
6
6
 
7
7
  ## Features
8
8
 
9
- - **Embeddings**: Jina AI `jina-embeddings-v3` with task-specific LoRA adapters (configurable)
9
+ - **Embeddings**: Jina AI `jina-embeddings-v5-text-small` with task-specific LoRA adapters (configurable)
10
10
  - **Vector Backend**: Turso/libSQL with vector search (local file DB for development, remote for production)
11
- - **Rerank**: Optional Jina reranker — same API key, one boolean to enable
11
+ - **Rerank**: Jina `jina-reranker-v3` enabled by default — same API key
12
12
  - **Page Aggregation**: Group results by page with score-weighted chunk decay
13
13
  - **Meta Extraction**: Automatically extracts `<meta name="description">` and `<meta name="keywords">` for improved relevance
14
14
  - **SvelteKit Integrations**:
15
15
  - `searchsocketHandle()` for `POST /api/search` endpoint
16
16
  - `searchsocketVitePlugin()` for build-triggered indexing
17
- - **Client Library**: `createSearchClient()` for browser-side search
17
+ - **Client Library**: `createSearchClient()` for browser-side search, `buildResultUrl()` for scroll-to-section links
18
+ - **Scroll-to-Text**: `searchsocketScrollToText()` auto-scrolls to matching sections on navigation
18
19
  - **MCP Server**: Model Context Protocol tools for search and page retrieval
19
- - **Git-Tracked Markdown Mirror**: Commit-safe deterministic markdown outputs
20
20
 
21
21
  ## Install
22
22
 
@@ -163,7 +163,7 @@ pnpm searchsocket search --q "getting started" --top-k 5 --path-prefix /docs
163
163
  "meta": {
164
164
  "timingsMs": { "embed": 120, "vector": 15, "rerank": 0, "total": 135 },
165
165
  "usedRerank": false,
166
- "modelId": "jina-embeddings-v3"
166
+ "modelId": "jina-embeddings-v5-text-small"
167
167
  }
168
168
  }
169
169
  ```
@@ -291,6 +291,53 @@ for (const result of response.results) {
291
291
  }
292
292
  ```
293
293
 
294
+ ## Scroll-to-Text Navigation
295
+
296
+ When a visitor clicks a search result, SearchSocket can automatically scroll them to the relevant section on the destination page. This uses two utilities:
297
+
298
+ ### `buildResultUrl(result)`
299
+
300
+ Builds a URL from a search result that includes:
301
+ - A `_ssk` query parameter for SvelteKit client-side navigation (read by `searchsocketScrollToText`)
302
+ - A [Text Fragment](https://developer.mozilla.org/en-US/docs/Web/URI/Fragment/Text_fragments) (`#:~:text=`) for native browser scroll-to-text on full page loads (Chrome 80+, Safari 16.1+, Firefox 131+)
303
+
304
+ Import from `searchsocket/client`:
305
+
306
+ ```ts
307
+ import { createSearchClient, buildResultUrl } from "searchsocket/client";
308
+
309
+ const client = createSearchClient();
310
+ const { results } = await client.search({ q: "installation" });
311
+
312
+ // Use in your search UI
313
+ for (const result of results) {
314
+ const href = buildResultUrl(result);
315
+ // "/docs/getting-started?_ssk=Installation#:~:text=Installation"
316
+ }
317
+ ```
318
+
319
+ If the result has no `sectionTitle`, the original URL is returned unchanged.
320
+
321
+ ### `searchsocketScrollToText`
322
+
323
+ A SvelteKit `afterNavigate` hook that reads the `_ssk` parameter and scrolls the matching heading into view. Add it to your root layout:
324
+
325
+ ```svelte
326
+ <!-- src/routes/+layout.svelte -->
327
+ <script>
328
+ import { afterNavigate } from '$app/navigation';
329
+ import { searchsocketScrollToText } from 'searchsocket/sveltekit';
330
+
331
+ afterNavigate(searchsocketScrollToText);
332
+ </script>
333
+ ```
334
+
335
+ The hook:
336
+ - Matches headings (h1–h6) case-insensitively with whitespace normalization
337
+ - Falls back to a broader text node search if no heading matches
338
+ - Scrolls smoothly to the first match
339
+ - Is a silent no-op when `_ssk` is absent or no match is found
340
+
294
341
  ## Vector Backend: Turso/libSQL
295
342
 
296
343
  SearchSocket uses **Turso** (libSQL) as its single vector backend, providing a unified experience across development and production.
@@ -399,7 +446,6 @@ The built-in `InMemoryRateLimiter` auto-disables on serverless platforms (it res
399
446
 
400
447
  The following features are only used during `searchsocket index` (CLI), not the search handler:
401
448
  - `ensureStateDirs` — creates `.searchsocket/` state directories
402
- - Markdown mirror — writes `.searchsocket/mirror/` files
403
449
  - Local SQLite fallback — only needed when `TURSO_DATABASE_URL` is not set
404
450
 
405
451
  ### Adapter Guidance
@@ -416,9 +462,9 @@ SearchSocket uses **Jina AI's embedding models** to convert text into semantic v
416
462
 
417
463
  ### Default Model
418
464
 
419
- - **Model**: `jina-embeddings-v3`
465
+ - **Model**: `jina-embeddings-v5-text-small`
420
466
  - **Dimensions**: 1024 (default)
421
- - **Cost**: ~$0.00002 per 1K tokens (generous 10M token free tier)
467
+ - **Cost**: ~$0.00005 per 1K tokens
422
468
  - **Task adapters**: Uses `retrieval.passage` for indexing, `retrieval.query` for search queries (LoRA task-specific adapters for better retrieval quality)
423
469
 
424
470
  ### How It Works
@@ -537,34 +583,6 @@ SEARCHSOCKET_AUTO_INDEX=1 pnpm build
537
583
  SEARCHSOCKET_DISABLE_AUTO_INDEX=1 pnpm build
538
584
  ```
539
585
 
540
- ## Git-Tracked Markdown Mirror
541
-
542
- Indexing writes a **deterministic markdown mirror**:
543
-
544
- ```
545
- .searchsocket/pages/<scope>/<path>.md
546
- ```
547
-
548
- Example:
549
- ```
550
- .searchsocket/pages/main/docs/intro.md
551
- ```
552
-
553
- Each file contains:
554
- - Frontmatter: URL, title, scope, route file, metadata
555
- - Markdown: Extracted content
556
-
557
- **Why commit it?**
558
- - Content workflows (edit markdown, regenerate embeddings)
559
- - Version control for indexed content
560
- - Debugging (see exactly what was indexed)
561
- - Offline search (grep the mirror)
562
-
563
- Add to `.gitignore` if you don't need it:
564
- ```
565
- .searchsocket/pages/
566
- ```
567
-
568
586
  ## Commands
569
587
 
570
588
  ### `searchsocket init`
@@ -612,7 +630,7 @@ pnpm searchsocket status
612
630
  # Output:
613
631
  # project: my-site
614
632
  # resolved scope: main
615
- # embedding model: jina-embeddings-v3
633
+ # embedding model: jina-embeddings-v5-text-small
616
634
  # vector backend: turso/libsql (local (.searchsocket/vectors.db))
617
635
  # vector health: ok
618
636
  # last indexed (main): 2025-02-23T10:30:00Z
@@ -865,7 +883,7 @@ export default {
865
883
 
866
884
  embeddings: {
867
885
  provider: "jina",
868
- model: "jina-embeddings-v3",
886
+ model: "jina-embeddings-v5-text-small",
869
887
  apiKey: "jina_...", // direct API key (or use apiKeyEnv)
870
888
  apiKeyEnv: "JINA_API_KEY",
871
889
  batchSize: 64,
@@ -886,7 +904,7 @@ export default {
886
904
  rerank: {
887
905
  enabled: true,
888
906
  topN: 20,
889
- model: "jina-reranker-v2-base-multilingual"
907
+ model: "jina-reranker-v3"
890
908
  },
891
909
 
892
910
  ranking: {