sweet-search 2.5.3 → 2.5.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/NOTICE +5 -5
- package/README.md +579 -0
- package/core/search/context-expander.js +10 -1
- package/core/search/search-cli.js +1 -1
- package/core/search/search-pattern-planner.js +1 -1
- package/core/search/search-trace.js +1 -1
- package/eval/agent-read-workflows/bin/_ss-helpers.mjs +16 -3
- package/eval/agent-read-workflows/bin/ss-search +1 -1
- package/mcp/server.js +1 -1
- package/package.json +8 -8
package/NOTICE
CHANGED
|
@@ -7,10 +7,10 @@ This product includes software developed by Marko Sladojevic.
|
|
|
7
7
|
|
|
8
8
|
ATTRIBUTION NOTICE
|
|
9
9
|
|
|
10
|
-
Sweet Search is
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
10
|
+
Sweet Search is an agentic retrieval system.
|
|
11
|
+
It is highly competitive with native search (grep+Read)
|
|
12
|
+
and beats it in most cases in realized cost, tool call efficiency,
|
|
13
|
+
and answer usefulness per token, for most harness+model combos.
|
|
14
14
|
|
|
15
15
|
Original Author: Marko Sladojevic
|
|
16
16
|
Company: Panonit
|
|
@@ -19,5 +19,5 @@ Website: https://panonit.com
|
|
|
19
19
|
If you use Sweet Search in your project, please include this attribution
|
|
20
20
|
in your documentation, README, or application "About" section:
|
|
21
21
|
|
|
22
|
-
"Powered by Sweet Search - https://github.com/
|
|
22
|
+
"Powered by Sweet Search - https://github.com/mrsladoje/sweet-search"
|
|
23
23
|
"Created by Marko Sladojevic / Panonit"
|
package/README.md
ADDED
|
@@ -0,0 +1,579 @@
|
|
|
1
|
+
<div align="center">
|
|
2
|
+
|
|
3
|
+
<img src="assets/sweet-search-banner-pixelated.svg" alt="sweet-search" width="100%" />
|
|
4
|
+
|
|
5
|
+
### *Maybe grep isn't all you need…*
|
|
6
|
+
|
|
7
|
+
**A local-first hybrid code-search engine built for AI coding agents.**
|
|
8
|
+
Semantic + lexical + structural search over your working tree, GPU-accelerated local inference,
|
|
9
|
+
and an evolved system prompt that teaches your agent to use it all — even on plain CPU.
|
|
10
|
+
|
|
11
|
+
[](https://www.npmjs.com/package/sweet-search)
|
|
12
|
+
[](LICENSE)
|
|
13
|
+
[](package.json)
|
|
14
|
+
[](#platform-support)
|
|
15
|
+
[](#-gpu-accelerated-indexing-fully-local)
|
|
16
|
+
|
|
17
|
+
</div>
|
|
18
|
+
|
|
19
|
+
---
|
|
20
|
+
|
|
21
|
+
Your AI agent burns most of its tokens *looking* for code: grep, read, grep again, read more.
|
|
22
|
+
**sweet-search** replaces that loop with six purpose-built tools that return ranked, self-contained answers —
|
|
23
|
+
backed by a Rust/WASM engine, ColBERT-style late interaction, a code knowledge graph, and an index that
|
|
24
|
+
updates itself as you type.
|
|
25
|
+
|
|
26
|
+
<div align="center">
|
|
27
|
+
|
|
28
|
+
**10.2×** ripgrep's median grep speed · **2.9 ms** warm queries · **47×** faster reranking kernels · **0** API keys
|
|
29
|
+
|
|
30
|
+
<sub>measured in-repo — sources in [Benchmarks](#-benchmarks)</sub>
|
|
31
|
+
|
|
32
|
+
</div>
|
|
33
|
+
|
|
34
|
+
## ✨ Highlights
|
|
35
|
+
|
|
36
|
+
- **Hybrid retrieval** — BM25F lexical + dense semantic + structural graph signals, fused per query by a CatBoost router running in WASM
|
|
37
|
+
- **Agent-native by design** — token-budgeted output tiers, an MCP server, and a GEPA-evolved system prompt installed into Claude Code, Codex, Gemini CLI, and Cursor with one command
|
|
38
|
+
- **Indexed grep, ~10× ripgrep** — a sparse n-gram prefilter skips the files that provably can't match
|
|
39
|
+
- **ColBERT-style reranking, locally** — per-token MaxSim late interaction on hand-written SIMD kernels
|
|
40
|
+
- **Runs on anything** — Apple Metal, CUDA, CoreML Neural Engine, or plain CPU via INT8 ONNX; same engine, auto-selected
|
|
41
|
+
- **Never stale** — a reconcile daemon keeps the index converged with your *working tree*, uncommitted edits included
|
|
42
|
+
- **Fits in RAM** — INT4-quantized binary index segments and memory-mapped HNSW
|
|
43
|
+
- **Local-first** — all models run on-device; nothing is sent anywhere, ever
|
|
44
|
+
|
|
45
|
+
## 📚 Table of Contents
|
|
46
|
+
|
|
47
|
+
<table>
|
|
48
|
+
<tr>
|
|
49
|
+
<td width="22%" valign="top">
|
|
50
|
+
|
|
51
|
+
**GET STARTED**
|
|
52
|
+
|
|
53
|
+
[🚀 Quickstart](#-quickstart)<br>
|
|
54
|
+
<sub>three commands to a searchable repo</sub>
|
|
55
|
+
|
|
56
|
+
[🖥️ Platform Support](#platform-support)<br>
|
|
57
|
+
<sub>macOS · Linux · WASM fallback</sub>
|
|
58
|
+
|
|
59
|
+
</td>
|
|
60
|
+
<td width="27%" valign="top">
|
|
61
|
+
|
|
62
|
+
**USE IT**
|
|
63
|
+
|
|
64
|
+
[🧰 The Six Tools](#-the-six-tools)<br>
|
|
65
|
+
<sub>search · grep · find · semantic · trace · read</sub>
|
|
66
|
+
|
|
67
|
+
[🧠 The Evolved Agent Prompt](#-an-agent-prompt-that-was-evolved-not-written)<br>
|
|
68
|
+
<sub>GEPA-optimized search discipline</sub>
|
|
69
|
+
|
|
70
|
+
[🔌 Works With Your Agent](#-works-with-your-agent)<br>
|
|
71
|
+
<sub>MCP · Claude Code · Codex · Gemini · Cursor</sub>
|
|
72
|
+
|
|
73
|
+
</td>
|
|
74
|
+
<td width="27%" valign="top">
|
|
75
|
+
|
|
76
|
+
**UNDER THE HOOD**
|
|
77
|
+
|
|
78
|
+
[⚡ GPU-Accelerated Indexing](#-gpu-accelerated-indexing-fully-local)<br>
|
|
79
|
+
<sub>candle · fused kernels · cAST chunking</sub>
|
|
80
|
+
|
|
81
|
+
[🔄 An Index That Never Goes Stale](#-an-index-that-never-goes-stale)<br>
|
|
82
|
+
<sub>reconcile daemon tracks your working tree</sub>
|
|
83
|
+
|
|
84
|
+
[🦀 The Native Engine Room](#-the-native-engine-room)<br>
|
|
85
|
+
<sub>four Rust crates + TurboQuant compression</sub>
|
|
86
|
+
|
|
87
|
+
[🎯 The Ranking Stack](#-the-ranking-stack)<br>
|
|
88
|
+
<sub>route → retrieve → fuse → rerank → expand</sub>
|
|
89
|
+
|
|
90
|
+
</td>
|
|
91
|
+
<td width="24%" valign="top">
|
|
92
|
+
|
|
93
|
+
**THE RECEIPTS**
|
|
94
|
+
|
|
95
|
+
[📊 Benchmarks](#-benchmarks)<br>
|
|
96
|
+
<sub>full-corpus MRR, no distractor shortcuts</sub>
|
|
97
|
+
|
|
98
|
+
[🙏 Prior Art & Acknowledgements](#-prior-art--acknowledgements)<br>
|
|
99
|
+
<sub>the shoulders we stand on</sub>
|
|
100
|
+
|
|
101
|
+
[📄 License](#-license)<br>
|
|
102
|
+
<sub>Apache-2.0</sub>
|
|
103
|
+
|
|
104
|
+
</td>
|
|
105
|
+
</tr>
|
|
106
|
+
</table>
|
|
107
|
+
|
|
108
|
+
## 🚀 Quickstart
|
|
109
|
+
|
|
110
|
+
```bash
|
|
111
|
+
npm install -g sweet-search
|
|
112
|
+
|
|
113
|
+
cd your-repo
|
|
114
|
+
sweet-search init # one-time: downloads local models, wires up your agent
|
|
115
|
+
sweet-search index # builds the index — GPU-accelerated where available
|
|
116
|
+
|
|
117
|
+
sweet-search "where do we validate JWT tokens?"
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
That's it. `init` is idempotent and SHA256-verifies every model binary; re-running it is always safe.
|
|
121
|
+
From then on the index maintains itself — edit, save, search.
|
|
122
|
+
|
|
123
|
+
> **Latest release: v2.5.5** — the agent-mode preview tier now defaults to a 3k token budget (was 4k):
|
|
124
|
+
> same accuracy and usefulness in a 4-model paired sweep, ~11–15% cheaper per query. Already on an
|
|
125
|
+
> older install? `npm install -g sweet-search` again to pick it up.
|
|
126
|
+
|
|
127
|
+
<details>
|
|
128
|
+
<summary><b>Setup options & details</b></summary>
|
|
129
|
+
|
|
130
|
+
```bash
|
|
131
|
+
sweet-search init --wizard # interactive: shows your hardware, recommends a model tier
|
|
132
|
+
sweet-search init --profile core # lexical-only, no model downloads (CI-friendly)
|
|
133
|
+
sweet-search init --li-model edge # compact late-interaction model for constrained machines
|
|
134
|
+
sweet-search uninstall # clean removal: models, caches, config — never your code
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
- **Requirements:** Node ≥ 18. macOS (arm64/x64) and Linux (x64/arm64) ship native binaries; other platforms fall back to WASM/JS automatically.
|
|
138
|
+
- **Footprint:** CPU-only hosts download a few hundred MB of INT8 models; GPU hosts add ~1.2 GB of FP32 backbones (skipped automatically where they'd be useless); M3+ Macs can additionally fetch a ~3.2 GB CoreML cascade for Neural Engine acceleration. Everything lands in `~/.cache/sweet-search/models/` and is used strictly on-device.
|
|
139
|
+
- **Agent wiring:** init injects the tool-routing system prompt into `CLAUDE.md` (and `AGENTS.md`, `GEMINI.md`, Cursor rules via flags), registers a session-start prewarm hook so your first query hits a warm daemon, and installs a `/sweet-index` skill in Claude Code.
|
|
140
|
+
- **What gets indexed:** what you'd expect — `.gitignore` is respected, `node_modules`/build dirs/minified artifacts are denied, files over 1 MB skipped, with a `.sweet-search-ignore` for extra rules.
|
|
141
|
+
|
|
142
|
+
</details>
|
|
143
|
+
|
|
144
|
+
## 📊 Benchmarks
|
|
145
|
+
|
|
146
|
+
> [!WARNING]
|
|
147
|
+
> ⚠️ **THESE NUMBERS ARE STALE — TREAT THEM AS A FLOOR, NOT THE CURRENT SCORE.** ⚠️
|
|
148
|
+
> Several results below were measured on builds that predate major accuracy work
|
|
149
|
+
> (late-interaction correctness fixes, HNSW tuning, the May 2026 ranking overhaul).
|
|
150
|
+
> Every benchmark is being re-run on the current engine and this table will be
|
|
151
|
+
> replaced with fresh numbers. Until then, expect the real results to be **higher**.
|
|
152
|
+
|
|
153
|
+
Every number below is the **`ss-search` pipeline end-to-end** — the same binary you install, querying
|
|
154
|
+
against the **full corpus** (no 99-distractor shortcuts), measured at 26–41 ms p50 on an M3 Max.
|
|
155
|
+
|
|
156
|
+
| Benchmark | What it tests | Queries | MRR@10 |
|
|
157
|
+
|-----------|---------------|--------:|-------:|
|
|
158
|
+
| **GenCodeSearchNet** | NL→code, 6 languages | 6,000 | **86.6** |
|
|
159
|
+
| **M2CRB** | multilingual NL→code (ES/PT/DE/FR → Py/Java/JS) | 2,814 | **60.2** |
|
|
160
|
+
| CoSQA (test split) | web queries → Python | 500 | 97.0 |
|
|
161
|
+
| CoSQA+ | web queries → Python, multi-match | 20,604 | 72.1 |
|
|
162
|
+
| CLARC | NL→C/C++ (systems code) | 1,245 | 67.4 |
|
|
163
|
+
| AdvTest † | adversarially renamed Python | 1,000 | 91.5 |
|
|
164
|
+
| CoIR † | 10 datasets, 14 languages | 4,500 | 57.3 |
|
|
165
|
+
|
|
166
|
+
**GenCodeSearchNet: the strongest result published anywhere, as far as we can tell.** The benchmark's
|
|
167
|
+
own paper tops out at MRR ≤ 0.42 for its fine-tuned baselines (and ≤ 0.10 on the cross-lingual subsets),
|
|
168
|
+
with zero-shot OpenAI Ada-2 at 0.79–0.94 — and those are measured against **99 random distractors per
|
|
169
|
+
query**. sweet-search scores **0.866**, retrieving from the entire 6,000-document corpus.
|
|
170
|
+
|
|
171
|
+
**M2CRB: best published number, no fine-tuning.** The benchmark paper's best model — a CodeBERT
|
|
172
|
+
*fine-tuned on the task's training mix* — reaches 52.7 (auMRRc, a metric averaged over smaller retrieval
|
|
173
|
+
pools). sweet-search reaches **60.2 full-corpus MRR@10 out of the box**, on Spanish, Portuguese, German,
|
|
174
|
+
and French queries.
|
|
175
|
+
|
|
176
|
+
<details>
|
|
177
|
+
<summary><b>Methodology, staleness flags & systems numbers</b></summary>
|
|
178
|
+
|
|
179
|
+
- **Reproduction:** result artifacts live in `eval/results/`; rerun via `eval/run_all.js`.
|
|
180
|
+
- **Protocol note:** published baselines for GCSN and CoSQA-style benchmarks typically rank the gold snippet against 99 sampled distractors. All sweet-search numbers rank against the full benchmark corpus — strictly harder.
|
|
181
|
+
- **† Staleness:** AdvTest and CoIR were last run on the February 2026 build — before the late-interaction correctness fixes, HNSW tuning, and the May ranking work. They likely understate the current engine; re-runs are queued. CoSQA/M2CRB are from the April build; GCSN, CoSQA+, and CLARC are current (May 2026).
|
|
182
|
+
- **Honesty corner:** CrossCodeEval — cross-file *completion context* retrieval, a different task than NL search — sits at 0.12. We don't optimize for it and report it anyway.
|
|
183
|
+
- Dates and per-language breakdowns: [`docs/BENCHMARKS_EXPLAINED.md`](docs/BENCHMARKS_EXPLAINED.md).
|
|
184
|
+
|
|
185
|
+
Systems performance, measured in-repo:
|
|
186
|
+
|
|
187
|
+
| What | Result | Source |
|
|
188
|
+
|------|--------|--------|
|
|
189
|
+
| Indexed grep vs ripgrep | **10.2× faster** at the median (8.5–17.7× across 5 repos, 353 realistic queries, 1 ms p50 — identical match counts on every query) | [`docs/GREP_INDEXING_STRATEGY.md`](docs/GREP_INDEXING_STRATEGY.md) |
|
|
190
|
+
| Warm query latency (native CLI) | **2.9 ms** warm · 108 ms cold | [`docs/INIT_STRATEGY.md`](docs/INIT_STRATEGY.md) |
|
|
191
|
+
| MaxSim rerank kernels | **1.26 s → 27 ms** for a 231-candidate pass (47× native Rust; 16× WASM SIMD) | [`docs/MAXSIM_OPTIMIZATION.md`](docs/MAXSIM_OPTIMIZATION.md) |
|
|
192
|
+
| HNSW tuning for code | **−33%** search p50, **+5.9 pp** recall@200 | [`docs/HNSW_APPROACH.md`](docs/HNSW_APPROACH.md) |
|
|
193
|
+
| Indexing memory | peak JS heap **785 MB → 213 MB** | [`docs/DISK_FLUSHING_STRATEGY.md`](docs/DISK_FLUSHING_STRATEGY.md) |
|
|
194
|
+
| CoreML cascade (M3 Max) | **18% faster** full indexing vs the Metal baseline | [`docs/INIT_STRATEGY.md`](docs/INIT_STRATEGY.md) |
|
|
195
|
+
|
|
196
|
+
</details>
|
|
197
|
+
|
|
198
|
+
## 🧰 The Six Tools
|
|
199
|
+
|
|
200
|
+
Six small tools, one shared index. Each returns ranked, deduplicated, token-budgeted output designed
|
|
201
|
+
to be *consumed by an agent* — a useful answer, not a wall of matches to scroll through.
|
|
202
|
+
|
|
203
|
+
| Tool | What you give it | What you get back |
|
|
204
|
+
|------|------------------|-------------------|
|
|
205
|
+
| `ss-search` | a natural-language query | ranked, **self-contained code blocks** |
|
|
206
|
+
| `ss-grep` | an exact regex/literal | `file:line` hits, **ranked** |
|
|
207
|
+
| `ss-find` | a regex **+** a query | regex matches, **semantically re-ranked, as code blocks** |
|
|
208
|
+
| `ss-semantic` | a file **+** a question | just the **relevant spans** of that file |
|
|
209
|
+
| `ss-trace` | a symbol | **callers + callees + impact**, in one call |
|
|
210
|
+
| `ss-read` | a file (± line range) | exact bytes **+ symbol metadata** |
|
|
211
|
+
|
|
212
|
+
### `ss-search` — the full retrieval stack in one call
|
|
213
|
+
|
|
214
|
+
```bash
|
|
215
|
+
ss-search "how are websocket reconnects handled?" -k 5
|
|
216
|
+
```
|
|
217
|
+
|
|
218
|
+
One query fires the whole pipeline:
|
|
219
|
+
|
|
220
|
+
1. **CatBoost query router** — a 498-tree gradient-boosted classifier compiled to WASM decides lexical vs hybrid from 50 single-pass features (camelCase/snake_case decomposition, CJK density, path shape…) in microseconds, with a low-confidence reject option that falls back to max-recall hybrid. Real file paths short-circuit straight to lexical.
|
|
221
|
+
2. **Dual retrieval** — **BM25F** over field-weighted FTS5 (a hit on a function's *name* outweighs one buried in its body 10:1) runs in parallel with a **three-stage ANN cascade**: binary HNSW (Hamming distance over 64-byte binarized vectors, candidates in ~100 µs) → INT8 rescoring → full-precision float32 rescoring from a memory-mapped sidecar.
|
|
222
|
+
3. **Convex-combination fusion** with route-specific weights and quantile normalization — and an automatic **RRF** fallback when score distributions degenerate.
|
|
223
|
+
4. **Identifier-Anchored Retrieval (IAR)** — if your English mentions a real symbol, an exact-name lookup against the code graph injects that entity into the pool, even when the encoder ranked something tangential higher.
|
|
224
|
+
5. **Intent-aware reranking** — docs/tests/config demoted when you want implementation; log-scaled call-site reference boosts surface the function everyone actually calls.
|
|
225
|
+
6. **Adaptive graph expansion** — typed-edge walks (imports / extends / calls / uses) 1–2 hops out along the AST-derived knowledge graph, with intent-selected edge types, PathRAG-style flow-threshold pruning, and degree normalization so hub entities can't dominate.
|
|
226
|
+
7. **Late-interaction rerank** — ColBERT-style per-token MaxSim over the quantized token index, on kernels that took a 231-candidate scoring pass from **1.26 s to 27 ms**.
|
|
227
|
+
8. **Answer packaging** — near-duplicate siblings collapse to the best-matching member, MMR balances diversity, and entity-aware expansion emits *self-contained* blocks (whole functions with imports, docstrings, decorators) under an auto-selected **3k / 8k / 12k token budget** driven by post-ranking signals like top-1 dominance.
|
|
228
|
+
|
|
229
|
+
<details>
|
|
230
|
+
<summary><b>More</b></summary>
|
|
231
|
+
|
|
232
|
+
- The expensive 8k/12k tiers are tuned to fire on roughly 1–5% of queries — the default case stays cheap. Force a tier with `--full` / `--xl`, or a mode with `--mode lexical|semantic|hybrid|pattern`.
|
|
233
|
+
- Also available as `sweet-search "<query>"` on the CLI and the `search` MCP tool.
|
|
234
|
+
|
|
235
|
+
</details>
|
|
236
|
+
|
|
237
|
+
### `ss-grep` — grep, minus every wasted millisecond
|
|
238
|
+
|
|
239
|
+
```bash
|
|
240
|
+
ss-grep "parseRetryAfter" -k 10
|
|
241
|
+
```
|
|
242
|
+
|
|
243
|
+
**10.2× faster than ripgrep end-to-end at the median** — measured across **353 realistic queries on 5 real repos**
|
|
244
|
+
(range 8.5–17.7× per repo, 1 ms p50), with **identical match counts on every single query**. Three things buy that:
|
|
245
|
+
|
|
246
|
+
- **A sparse n-gram index** (inspired by [Cursor's fast-regex-search](https://cursor.com/blog/fast-regex-search) and GitHub's Blackbird): instead of a fixed trigram table, gram boundaries adapt to *your* codebase's character-pair frequencies, so common trigrams get absorbed into longer, more selective grams.
|
|
247
|
+
- **Regex-AST literal extraction + SIMD intersection**: required substrings are pulled from the pattern's syntax tree, posting lists are intersected with NEON/SSE2 block merges (galloping search for skewed sizes), and only the files that *can* match — typically 0.1–5% of the corpus — see the real regex.
|
|
248
|
+
- **Fully in-process**: verification runs on Rust's regex crate with Rayon across all cores, inside the warm daemon, in a single NAPI call. No child process is ever spawned — zero fork/exec, zero pipe I/O, zero JSON re-parsing.
|
|
249
|
+
|
|
250
|
+
Hits come back **ranked and scored**, so an agent can trust the top one and stop.
|
|
251
|
+
|
|
252
|
+
<details>
|
|
253
|
+
<summary><b>More</b></summary>
|
|
254
|
+
|
|
255
|
+
- Full methodology, per-repo table, and the optimization log: [`docs/GREP_INDEXING_STRATEGY.md`](docs/GREP_INDEXING_STRATEGY.md).
|
|
256
|
+
- Regexes with no extractable literals fall back to native grep over the indexed file set; fixed-string and glob queries use a ripgrep fallback.
|
|
257
|
+
|
|
258
|
+
</details>
|
|
259
|
+
|
|
260
|
+
### `ss-find` — ColGrep, on a faster engine
|
|
261
|
+
|
|
262
|
+
```bash
|
|
263
|
+
ss-find "token refresh logic" --regex "refresh.*[Tt]oken"
|
|
264
|
+
```
|
|
265
|
+
|
|
266
|
+
Inspired by LightOn's [ColGrep](https://github.com/lightonai/next-plaid/tree/main/colgrep) — regex precision,
|
|
267
|
+
semantically ranked — but rebuilt on our own substrate:
|
|
268
|
+
|
|
269
|
+
- The regex stage runs on the **same indexed sparse-gram engine as `ss-grep`** (in-process, no subprocess), not a filesystem scan.
|
|
270
|
+
- The ranking stage scores candidates with **per-token MaxSim over pre-indexed late-interaction embeddings** — no model inference over documents at query time — on our custom kernels: native Rust + Rayon takes a 231-candidate MaxSim pass from **1.26 s down to 27 ms** (WASM SIMD fallback at 16×).
|
|
271
|
+
- Regex tokens are merged into the semantic query, so the ranking sees both what you typed and what you matched.
|
|
272
|
+
- Like `ss-search`, it answers with **ranked, self-contained code snippets** — not bare `file:line` — so the find *and* the read collapse into one tool call. In our 30-question agent-workflow eval that eliminated **every follow-up read** and cut tokens **25.4%** vs a grep + read workflow, at quality parity (gap of 0.01 on a 5-point scale).
|
|
273
|
+
- On the 60-query pattern benchmark, MaxSim ranking lifts MRR@10 to **0.45** vs **0.11** for raw grep ordering — 4× more likely the right hit lands on top.
|
|
274
|
+
|
|
275
|
+
<details>
|
|
276
|
+
<summary><b>More</b></summary>
|
|
277
|
+
|
|
278
|
+
- Requires the late-interaction index (built by default; `--li-model none` disables pattern mode).
|
|
279
|
+
- Also available as `sweet-search --mode pattern` and via the `search` MCP tool's `regex` argument.
|
|
280
|
+
|
|
281
|
+
</details>
|
|
282
|
+
|
|
283
|
+
### `ss-semantic` — hybrid retrieval, scoped to one file
|
|
284
|
+
|
|
285
|
+
```bash
|
|
286
|
+
ss-semantic src/auth/session.ts "where does the cookie get its expiry?"
|
|
287
|
+
```
|
|
288
|
+
|
|
289
|
+
You know the file; this finds the lines. Every indexed chunk of the file is scored by **three independent
|
|
290
|
+
signals** — BM25-style lexical term match, exact symbol-name match (weighted 1.5×), and ColBERT-style
|
|
291
|
+
MaxSim over late-interaction token embeddings — fused with **Reciprocal Rank Fusion** (k=60), with
|
|
292
|
+
symbol-less fragment chunks demoted 0.85× so real definitions win ties. The top spans are then
|
|
293
|
+
**re-read from disk** (±2 context lines, overlapping spans merged), so the answer is filesystem ground
|
|
294
|
+
truth even mid-edit; if the file is newer than its index entry you get an explicit staleness warning.
|
|
295
|
+
|
|
296
|
+
The useful answer: just the relevant spans with line numbers — not the whole file through your context window.
|
|
297
|
+
|
|
298
|
+
<details>
|
|
299
|
+
<summary><b>More</b></summary>
|
|
300
|
+
|
|
301
|
+
- Unindexed files degrade gracefully to a plain read. Defaults: top 5 spans, relevance threshold 0.4, 8k-char cap.
|
|
302
|
+
- Also available as `sweet-search read-semantic` and the `read-semantic` MCP tool.
|
|
303
|
+
|
|
304
|
+
</details>
|
|
305
|
+
|
|
306
|
+
### `ss-trace` — graph algorithms, not grep guesswork
|
|
307
|
+
|
|
308
|
+
```bash
|
|
309
|
+
ss-trace processOrder --in src/orders/service.py
|
|
310
|
+
```
|
|
311
|
+
|
|
312
|
+
One call returns a symbol's **callers, callees, and transitive impact paths** from the AST-derived code
|
|
313
|
+
graph (entities + typed `calls`/`imports`/`extends`/`uses` edges, persisted in SQLite at index time).
|
|
314
|
+
Ranking fuses three signals:
|
|
315
|
+
|
|
316
|
+
- **Query-time Personalized PageRank** via Forward Push — a *local* algorithm that spreads mass directionally from your target symbol and touches only the neighborhood it reaches, never the whole graph;
|
|
317
|
+
- **Index-time edge-weighted global PageRank** (damping 0.85), precomputed into a `page_rank` column — a function called from five sites carries five units of mass, and it costs *zero* at query time;
|
|
318
|
+
- **Structural heuristics** — relationship type, depth, exported-API status, fan-in — with penalties for test-only and external paths.
|
|
319
|
+
|
|
320
|
+
Because the graph is prebuilt, the global ranking is precomputed, and the personalized walk is local,
|
|
321
|
+
a full three-section trace costs milliseconds. The relation word (`callers` / `callees` / `impact`)
|
|
322
|
+
re-weights how the response token budget is split; `--in` disambiguates duplicate names; `--depth`
|
|
323
|
+
bounds impact traversal (1–4).
|
|
324
|
+
|
|
325
|
+
<details>
|
|
326
|
+
<summary><b>More</b></summary>
|
|
327
|
+
|
|
328
|
+
- Honest caveat: call-graph extraction is precise but incomplete on highly dynamic code (bare-name dispatch, metaprogramming) — traces can be sparse there, and the agent prompt teaches a recovery strategy for exactly that case.
|
|
329
|
+
- Also available as `sweet-search trace` and the `trace` MCP tool.
|
|
330
|
+
|
|
331
|
+
</details>
|
|
332
|
+
|
|
333
|
+
### `ss-read` — exact bytes, with the index's knowledge attached
|
|
334
|
+
|
|
335
|
+
```bash
|
|
336
|
+
ss-read src/db/pool.js 120 180
|
|
337
|
+
```
|
|
338
|
+
|
|
339
|
+
A read tool that is **filesystem-grounded by construction**: bytes come straight from disk (never from
|
|
340
|
+
the index, so never stale), but each indexed file arrives annotated with its **cAST chunk metadata** —
|
|
341
|
+
symbol name, entity type, signature, line span — joined from the AST chunk index. The agent gets the
|
|
342
|
+
code *and* the structural map of what it's looking at in one call: cite, navigate, or trace next
|
|
343
|
+
without another search.
|
|
344
|
+
|
|
345
|
+
<details>
|
|
346
|
+
<summary><b>More</b></summary>
|
|
347
|
+
|
|
348
|
+
- The CLI/MCP form scales it up: `sweet-search read <file...>` (and the `read` MCP tool) batches **1–20 files in a single call**, each with the same symbol metadata — twenty files for the price of one tool invocation.
|
|
349
|
+
|
|
350
|
+
</details>
|
|
351
|
+
|
|
352
|
+
> The `ss-*` wrappers ship in the npm package and are what the installed agent prompt drives. Every
|
|
353
|
+
> capability is equally available as `sweet-search` CLI subcommands and as MCP tools — see
|
|
354
|
+
> [Works With Your Agent](#-works-with-your-agent).
|
|
355
|
+
|
|
356
|
+
## 🧠 An Agent Prompt That Was Evolved, Not Written
|
|
357
|
+
|
|
358
|
+
Giving an agent six tools is easy. Getting it to *stop grepping in circles* is not.
|
|
359
|
+
|
|
360
|
+
`sweet-search init` installs a ~1k-token system prompt that encodes a complete search discipline —
|
|
361
|
+
and it wasn't hand-written. It was **evolved with a GEPA-style optimization loop**: reflective mutation
|
|
362
|
+
by one model family, scored on a dual Pareto front (accuracy × cost) across two *different* production
|
|
363
|
+
targets, then validated on held-out probes and on **model families that were never part of the
|
|
364
|
+
optimization**, and finally hand-hardened with a correctness editing pass.
|
|
365
|
+
|
|
366
|
+
What it teaches:
|
|
367
|
+
|
|
368
|
+
- **Cheapest tool first** — hold an exact identifier? One `ss-grep`, trust the top hit, stop. No semantic search "just to confirm."
|
|
369
|
+
- **Trust the ranking** — confirm with at most one narrow read, never a re-run of a hit that already matched.
|
|
370
|
+
- **Absence is an answer** — two complementary empty probes (one semantic, one lexical) settle a negative; no third synonym, no `find`/`ls` spiral.
|
|
371
|
+
- **No raw-shell escape** — the #1 token-waster we found in trajectory analysis is agents abandoning the index for dozens of raw `grep`/`find` calls after one empty result. The prompt closes that door explicitly.
|
|
372
|
+
- **A reasoning checkpoint** — before a third probe, the agent must state what it has established and what its blind spot is.
|
|
373
|
+
|
|
374
|
+
<details>
|
|
375
|
+
<summary><b>How it was validated</b></summary>
|
|
376
|
+
|
|
377
|
+
- **Optimization targets:** two frontier model families in production harnesses (Claude Code and Codex-style CLIs), scored jointly so the prompt can't overfit to one model's quirks.
|
|
378
|
+
- **Selection:** dual Pareto fronts over per-probe accuracy and measured cost; candidates gated by paraphrase-invariance (the prompt's behavior must survive rewording).
|
|
379
|
+
- **Held-out discipline:** a dev probe set for iteration, a held-out set checked only at milestones, and a sealed vault set opened exactly once. Joint maximin on held-out: **0.988**; out-of-distribution probes: **0.95+**; vault: **0.963** — 2.5 pp below held-out, well inside the pre-registered 15% acceptance gate.
|
|
380
|
+
- **Held-out model families (HOMP):** the final prompt passed on two model families from different vendors that were never used during evolution — evidence the routing rules generalize, not memorize.
|
|
381
|
+
- All figures are from the in-repo evaluation program (internal probe suites; see [`docs/PHASE7.md`](docs/PHASE7.md)); the benchmark suite that will make these externally reproducible is in progress.
|
|
382
|
+
- Installation is idempotent and marker-delimited: re-running `init` updates the managed block in `CLAUDE.md` / `AGENTS.md` / `GEMINI.md` / `.cursor/rules` without touching anything else you wrote.
|
|
383
|
+
|
|
384
|
+
</details>
|
|
385
|
+
|
|
386
|
+
## ⚡ GPU-Accelerated Indexing, Fully Local
|
|
387
|
+
|
|
388
|
+
All inference is on-device, in Rust, via [candle](https://github.com/huggingface/candle) — with the
|
|
389
|
+
attention path swapped for **fused kernels tuned per backend**, and an honest CPU story for machines
|
|
390
|
+
with no accelerator at all.
|
|
391
|
+
|
|
392
|
+
| Your hardware | What runs |
|
|
393
|
+
|---------------|-----------|
|
|
394
|
+
| Apple Silicon (M1+) | candle **Metal**, BF16, fused SDPA attention |
|
|
395
|
+
| Apple Silicon (M3+) | … plus a **CoreML Neural Engine cascade** (~18% faster full indexing, measured on M3 Max) |
|
|
396
|
+
| NVIDIA GPU (SM 7.0+) | candle **CUDA**; **flash-attention** on Ampere+ |
|
|
397
|
+
| Anything else | **ONNX Runtime INT8** — optimized CPU path, ~139 MB embedding model, no GPU weights downloaded |
|
|
398
|
+
|
|
399
|
+
Before a single token is embedded, files are chunked by **[cAST](https://arxiv.org/abs/2506.15655)** —
|
|
400
|
+
structure-aware chunking over real **tree-sitter ASTs**. A recursive split-then-merge greedily packs
|
|
401
|
+
adjacent sibling AST nodes into a chunk until the size cap, and recurses *into* nodes too big to fit —
|
|
402
|
+
so every chunk is whole code: a function, a class, a contiguous run of declarations. Never a function
|
|
403
|
+
sliced mid-body, never a string split mid-literal. 14 languages get true AST grammars (JS/TS/TSX,
|
|
404
|
+
Python, Go, Rust, Java, C, C++, Ruby, PHP, Kotlin, Swift, C#); a 39-config regex registry extends
|
|
405
|
+
structure-aware chunking to 70+ file extensions beyond those. Each chunk carries its symbol name,
|
|
406
|
+
entity type, signature, and line span — the metadata that feeds the code graph, `ss-read`'s
|
|
407
|
+
annotations, and the self-contained answers everywhere else.
|
|
408
|
+
|
|
409
|
+
<details>
|
|
410
|
+
<summary><b>What's actually custom here</b></summary>
|
|
411
|
+
|
|
412
|
+
- **Surgical attention swap:** we vendor the upstream model implementations (NomicBERT for embeddings, ModernBERT for late interaction) and replace only the attention forward pass — an MLX-ported fused SDPA kernel on Metal, `candle-flash-attn` with varlen packing on CUDA Ampere+, and byte-for-byte upstream math on CPU so the fallback is provably identical.
|
|
413
|
+
- **A silent-NaN bug, found and fixed:** Apple's Metal SDPA kernel downcasts attention masks to F16, which saturates the standard `f32::MIN` mask to `-Inf` and quietly produces NaN on padded rows — collapsing retrieval quality. We clamp the mask and serialize Metal command-buffer submissions (concurrent submission corrupts outputs on shared queues). Details in [`crates/sweet-search-native/src/inference/`](crates/sweet-search-native/src/inference/).
|
|
414
|
+
- **CoreML cascade:** 18 pre-traced `.mlpackage` variants (bucketed by sequence length) dispatched to the Apple Neural Engine through an Objective-C shim; oversized batches fall through to Metal. Gated to M3+ because on M1/M2 the ANE doesn't beat its own compile overhead — we measured, so it's off there.
|
|
415
|
+
- **GPU off the event loop:** inference runs as napi `AsyncTask` on libuv worker threads, so tokenization and SQLite writes overlap GPU compute instead of stalling behind it.
|
|
416
|
+
- **Pipelined indexing:** while batch *N+1* embeds, batch *N*'s vectors stream into SQLite through zero-copy buffer views; full rebuilds write to a temp file and atomically swap, so a crash never leaves you serving half an index.
|
|
417
|
+
- **Models:** CodeRankEmbed (768-d, code-specialized) for embeddings; LateOn-Code (ModernBERT) for per-token late interaction, in a full-fidelity `standard` and a compact `edge` variant (~9× smaller FP32 backbone; ~2× smaller on the INT8 CPU path).
|
|
418
|
+
|
|
419
|
+
</details>
|
|
420
|
+
|
|
421
|
+
## 🔄 An Index That Never Goes Stale
|
|
422
|
+
|
|
423
|
+
Most code indexes rot the moment you start typing. sweet-search ships a **reconcile daemon** that
|
|
424
|
+
keeps every tier of the index converged with your **working tree** — uncommitted edits included —
|
|
425
|
+
without you ever running a command.
|
|
426
|
+
|
|
427
|
+
- **Save → searchable** at the next reconcile tick — auto-tuned per machine between 15 s and 300 s, typically 15–60 s on a warm, idle box
|
|
428
|
+
- **Tracks the filesystem, not git** — unstaged and uncommitted changes are first-class; deleted or newly-gitignored files disappear from results automatically
|
|
429
|
+
- **Atomic by construction** — every tick publishes all five index tiers (float HNSW, binary HNSW, late-interaction segments, sparse-gram, code graph) through a single fsync-renamed epoch manifest, so a query never sees a half-updated index
|
|
430
|
+
- **No-op edits cost almost nothing** — content hashing collapses byte-identical rewrites and editor touch events into skipped re-encoding work
|
|
431
|
+
|
|
432
|
+
<details>
|
|
433
|
+
<summary><b>Deep dive</b></summary>
|
|
434
|
+
|
|
435
|
+
- **Baseline gate:** the daemon never plays first-index-builder. It verifies a full-indexer fingerprint (epoch manifest + merkle config fingerprint + the vectors DB it names) before touching anything, and reports `waiting_for_initial_index` otherwise — no corrupted partial baselines.
|
|
436
|
+
- **One admission policy:** the full indexer and the reconciler share a single `createAdmissionPolicy` module (include globs → deny list → `.sweet-search-ignore` → 1 MB size cap → batched `git check-ignore`), so the two paths cannot drift.
|
|
437
|
+
- **Orphan sweep:** files that are deleted, newly excluded, or newly oversized get tombstoned across every tier; the index converges to exactly what a fresh full rebuild would produce.
|
|
438
|
+
- **Self-maintenance:** per-tier health watermarks (tombstone fraction, stale-doc ratio, delta ratio) schedule low-priority background compaction in a separate worker — the index stays fast over months without a manual rebuild.
|
|
439
|
+
- **Worktree-safe:** a worktree stamp plus a single-writer lockfile prevent two daemons from silently interleaving index histories across git worktrees.
|
|
440
|
+
- **Resource-polite:** ticks are budgeted (≤50 files / ≤2 s CPU per tick), run CPU-only (the GPU is reserved for cold full indexing), and the interval auto-tunes from load average, churn, and backlog.
|
|
441
|
+
- `sweet-search reconcile status` / `reconcile inspect <path>` explain exactly what the daemon thinks and why. Opt out any time with `SWEET_SEARCH_RECONCILE_V2=0`.
|
|
442
|
+
|
|
443
|
+
</details>
|
|
444
|
+
|
|
445
|
+
## 🦀 The Native Engine Room
|
|
446
|
+
|
|
447
|
+
Four Rust crates do the heavy lifting, each with a graceful fallback so the engine runs everywhere:
|
|
448
|
+
|
|
449
|
+
| Crate | What it does |
|
|
450
|
+
|-------|--------------|
|
|
451
|
+
| `sweet-search-native` | candle GPU/CPU inference, sparse-gram grep engine, SIMD posting-list intersection, SimHash/MinHash-LSH dedup, HuggingFace tokenizers — all over zero-copy NAPI |
|
|
452
|
+
| `wasm-maxsim` | a hand-written WASM SIMD kernel computing ColBERT MaxSim in ~4 KB (~1.6 KB gzipped), with fused INT8 dequantization inside the SIMD pipeline plus a 4-bit nibble-packed path |
|
|
453
|
+
| `wasm-router` | the 498-tree CatBoost query router, loop-unrolled, zero-allocation |
|
|
454
|
+
| `sweet-search-cli` | a native CLI that talks to a warm search daemon over a per-project Unix socket — **2.9 ms** measured warm-path queries |
|
|
455
|
+
|
|
456
|
+
<details>
|
|
457
|
+
<summary><b>Deep dive</b></summary>
|
|
458
|
+
|
|
459
|
+
- **MaxSim, three speeds:** scoring auto-selects the best available tier — native Rust + Rayon across all cores (**47×** vs baseline JS in our microbenchmark), portable WASM SIMD (**16×**), or a norm-cached pure-JS fallback (3.5×). Equivalent rankings, any platform.
|
|
460
|
+
- **SIMD set intersection:** posting-list intersection dispatches per-pair — galloping search when one list is ≥8× smaller, 4-wide NEON/SSE2 block merges for balanced lists, scalar merge for small ones — following the Lemire/Clausecker line of work.
|
|
461
|
+
- **Dedup at index time:** near-duplicate chunks are fingerprinted (64-bit SimHash + 128-permutation MinHash), clustered with banded LSH + union-find, then *re-validated pairwise* against the exemplar so transitive weak links can't glue unrelated clusters together. Duplicates skip embedding entirely — and at query time the best-matching *sibling* can take the exemplar's slot, so collapsing copies never hides the right answer.
|
|
462
|
+
- **Per-project warm daemon:** the CLI derives an isolated socket path from an FNV-1a hash of the project root, auto-starts the server on first use, and falls back to pure JS where no native binary exists (measured: 2.9 ms warm / 108 ms cold / 64.7 ms JS fallback).
|
|
463
|
+
- **Native tokenization:** the official HuggingFace `tokenizers` crate over NAPI — batched, cached, no Python anywhere in the stack.
|
|
464
|
+
|
|
465
|
+
</details>
|
|
466
|
+
|
|
467
|
+
### 🗜️ TurboQuant: an index that fits in RAM
|
|
468
|
+
|
|
469
|
+
A 17k-document codebase's late-interaction index weighed **1.34 GiB** as JSON-encoded INT8. The binary
|
|
470
|
+
segment format cut the same index to **~396 MiB** (3.4× of pure ASCII bloat, gone) — and the INT4
|
|
471
|
+
default packs token vectors at half a byte each on top of that. Laptop-sized, fully in RAM.
|
|
472
|
+
|
|
473
|
+
<details>
|
|
474
|
+
<summary><b>Deep dive</b></summary>
|
|
475
|
+
|
|
476
|
+
- **INT4 by default:** per-token min/scale quantization with nibble packing (two values per byte), A/B-tested against the INT8 baseline with no meaningful retrieval regression before becoming the default.
|
|
477
|
+
- **SSLX binary segments:** the index persists as ~10k-document binary segment files with structured headers and CRC32 footers — a crash costs you at most one segment, not the index.
|
|
478
|
+
- **Three-stage retrieval:** a binary HNSW (Hamming distance over 64-byte binarized vectors, ~32× smaller than float HNSW) produces candidates in ~100 µs, INT8 rescoring narrows them, and a float32 sidecar rescores the final pool — speed without giving up top-result quality.
|
|
479
|
+
- **Memory-mapped HNSW:** the float graph index loads via `mmap` (USearch `view()`), contributing **0 MB** to the V8 heap at search time; the OS reclaims pages under pressure.
|
|
480
|
+
- **Streaming indexer:** vectors stream from SQLite cursors instead of materializing in arrays — peak JS heap during indexing dropped from ~785 MB to ~213 MB, with 30-second fsync-ordered checkpoints bounding crash loss. The OOM cliff that used to appear above ~200k chunks is gone; large repos index comfortably on an 8 GB machine.
|
|
481
|
+
- Tuned HNSW parameters and zero-GC search internals (typed-array heaps, generation-stamped visited lists) cut search p50 by 33% while *raising* recall@200 by 5.9 pp in our internal evaluation ([`docs/HNSW_APPROACH.md`](docs/HNSW_APPROACH.md)).
|
|
482
|
+
|
|
483
|
+
</details>
|
|
484
|
+
|
|
485
|
+
## 🎯 The Ranking Stack
|
|
486
|
+
|
|
487
|
+
Retrieval quality comes from *layers*, each one cheap, each one earning its place:
|
|
488
|
+
|
|
489
|
+
1. **Route** — CatBoost classifies the query (lexical / semantic / hybrid) and sets fusion weights; real file paths short-circuit straight to lexical
|
|
490
|
+
2. **Retrieve** — BM25F field-weighted lexical (a match on a function's *name* outranks one buried in a body) in parallel with the three-stage vector pipeline
|
|
491
|
+
3. **Fuse** — convex combination with per-route weights and quantile normalization, falling back to Reciprocal Rank Fusion on degenerate score distributions
|
|
492
|
+
4. **Anchor** — name a real symbol in your query and identifier-anchored retrieval injects the exact-name entity, even when the encoder ranked something tangential higher
|
|
493
|
+
5. **Rerank** — ColBERT-style MaxSim late interaction over the quantized token index
|
|
494
|
+
6. **Expand** — typed-edge graph walks (1–2 hops, intent-adaptive, PathRAG-style flow pruning) pull in the related code a single chunk can't show
|
|
495
|
+
7. **Polish** — intent-aware demotion of docs/tests/config when you want implementation, call-site reference boosts, MMR diversity, near-duplicate sibling re-ranking
|
|
496
|
+
|
|
497
|
+
<details>
|
|
498
|
+
<summary><b>Deep dive & design honesty</b></summary>
|
|
499
|
+
|
|
500
|
+
- **Intent awareness:** a lightweight classifier distinguishes "fix this crash" from "how do I use this API" and tunes graph-edge selection, result limits, and chunk-type preferences per intent.
|
|
501
|
+
- **Quality priors:** each chunk carries a 0–1 prior from test proximity, git recency, symbol centrality (PageRank), comment density, and complexity — production code surfaces, stale fixtures sink.
|
|
502
|
+
- **Community structure:** a canonical Leiden algorithm detects code communities on the entity graph at index time, feeding vocabulary prewarming and structural signals — the engine understands your modules, not just your directories.
|
|
503
|
+
- **Multilingual:** 14 languages get full tree-sitter AST treatment; a 39-config registry covers 70+ extensions beyond that; router features handle camelCase/snake_case decomposition, CJK density, and German compounds.
|
|
504
|
+
- **Long-query rescue:** wordy natural-language queries that FTS5 would tokenize into an unsatisfiable AND get a multi-query BM25F + RRF fallback — one query per content keyword, fused.
|
|
505
|
+
- **A negative result we ship anyway:** we built a full cross-encoder rerank cascade behind an adaptive confidence gate, measured it on our evaluation sets — and it didn't beat MaxSim at 3× the latency. So it ships **disabled** (`SWEET_SEARCH_CASCADE_ENABLED=true` if you want to try). We'd rather ship the faster path than a fancier diagram.
|
|
506
|
+
|
|
507
|
+
</details>
|
|
508
|
+
|
|
509
|
+
## 🔌 Works With Your Agent
|
|
510
|
+
|
|
511
|
+
sweet-search meets your agent wherever it is — shell tools, MCP, or injected instructions:
|
|
512
|
+
|
|
513
|
+
```jsonc
|
|
514
|
+
// .claude/mcp.json — that's the whole integration
|
|
515
|
+
{
|
|
516
|
+
"mcpServers": {
|
|
517
|
+
"sweet-search": {
|
|
518
|
+
"command": "npx",
|
|
519
|
+
"args": ["sweet-search-mcp", "--project-root", "/absolute/path/to/your/repo"]
|
|
520
|
+
}
|
|
521
|
+
}
|
|
522
|
+
}
|
|
523
|
+
```
|
|
524
|
+
|
|
525
|
+
- **MCP server** — 8 tools (`search`, `trace`, `read`, `read-semantic`, `index`, `health`, `repo-map`, `vocab-prewarm`), 2 resources, 2 prompts; all search tools declared read-only and idempotent
|
|
526
|
+
- **Harness injection** — `init` writes the evolved system prompt into Claude Code, Codex (`--codex`, including session hooks), Gemini CLI (`--gemini`), and Cursor (`--cursor`) from one canonical source
|
|
527
|
+
- **Repo maps for sub-agents** — the `repo-map` tool returns a PageRank-ranked symbol overview squeezed into any token budget, perfect for briefing a delegated agent
|
|
528
|
+
- **Warm from the first query** — a SessionStart hook pre-launches the search daemon so models, vocabulary, and indexes are loaded before you ask anything
|
|
529
|
+
|
|
530
|
+
<details>
|
|
531
|
+
<summary><b>Deep dive</b></summary>
|
|
532
|
+
|
|
533
|
+
- **Tool routing enforcement (opt-in):** `init --enforce-tools` denies the native Grep tool in Claude Code and installs a hint hook nudging native Read toward `ss-read`/`ss-semantic` — for when you want the discipline guaranteed, not suggested.
|
|
534
|
+
- **`/sweet-index` skill:** a Claude Code slash command for a full GPU-aware reindex, installed by init.
|
|
535
|
+
- **Vocabulary prewarm:** `sweet-search prewarm-vocab` mines your repo's real identifiers, detects code communities (Leiden), and pre-warms all three search modes so even the first semantic query of a session is cache-warm.
|
|
536
|
+
- **Honest committed-state:** init never writes machine-specific absolute paths into committed settings files, and all instruction injection is marker-delimited and reversible.
|
|
537
|
+
|
|
538
|
+
</details>
|
|
539
|
+
|
|
540
|
+
<a id="platform-support"></a>
|
|
541
|
+
|
|
542
|
+
## 🖥️ Platform Support
|
|
543
|
+
|
|
544
|
+
| Platform | Engine | Acceleration |
|
|
545
|
+
|----------|--------|--------------|
|
|
546
|
+
| macOS arm64 (Apple Silicon) | native | Metal (M1+) · CoreML Neural Engine (M3+) |
|
|
547
|
+
| macOS x64 (Intel) | native | ONNX Runtime INT8 CPU |
|
|
548
|
+
| Linux x64 (glibc) | native | CUDA (SM 7.0+, flash-attn on Ampere+) or INT8 CPU |
|
|
549
|
+
| Linux arm64 (glibc) | native | CUDA (Jetson Orin / Grace) or INT8 CPU |
|
|
550
|
+
| Windows | — | via WSL2 (= Linux x64) |
|
|
551
|
+
| Everything else | WASM/JS fallback | runs everywhere Node ≥ 18 runs |
|
|
552
|
+
|
|
553
|
+
Native binaries are selected automatically at `npm install` time via optionalDependencies — no flags, no postinstall scripts to debug. Every native fast path has a WASM or JS fallback that produces the same results.
|
|
554
|
+
|
|
555
|
+
## 🙏 Prior Art & Acknowledgements
|
|
556
|
+
|
|
557
|
+
sweet-search stands on a lot of shoulders, and we'd rather name them than pretend otherwise:
|
|
558
|
+
|
|
559
|
+
- **[ColBERT](https://arxiv.org/abs/2004.12832)** (Khattab & Zaharia) — late interaction; **[LightOn](https://huggingface.co/lightonai)** for the LateOn-Code models and the ColGrep concept our pattern mode parallels
|
|
560
|
+
- **[ripgrep](https://github.com/BurntSushi/ripgrep)** (BurntSushi) — the bar for grep, and our verification baseline
|
|
561
|
+
- **GitHub's [Blackbird](https://github.blog/engineering/the-technology-behind-githubs-new-code-search/)** — the sparse n-gram indexing idea we tuned per-codebase
|
|
562
|
+
- **[candle](https://github.com/huggingface/candle)** & **[MLX](https://github.com/ml-explore/mlx)** — Rust ML and the fused SDPA kernels we build on; **[HuggingFace tokenizers](https://github.com/huggingface/tokenizers)**
|
|
563
|
+
- **[Aider](https://github.com/Aider-AI/aider)** — the repo-map idea, here rebuilt on a real knowledge graph
|
|
564
|
+
- **[USearch](https://github.com/unum-cloud/usearch)** — memory-mapped HNSW; **Malkov & Yashunin** for [HNSW](https://arxiv.org/abs/1603.09320) itself
|
|
565
|
+
- **[CatBoost](https://catboost.ai/)** — the query router model; **Traag et al.** for the [Leiden algorithm](https://arxiv.org/abs/1810.08473); **Cormack et al.** for RRF; **[PathRAG](https://arxiv.org/abs/2502.14902)** for flow-pruned graph expansion; **[cAST](https://arxiv.org/abs/2506.15655)** for structure-aware chunking
|
|
566
|
+
- **[GEPA](https://arxiv.org/abs/2507.19457)** — the reflective evolutionary prompt-optimization paradigm behind our agent prompt
|
|
567
|
+
- **[nomic-ai](https://huggingface.co/nomic-ai)** — the CodeRankEmbed embedding model
|
|
568
|
+
|
|
569
|
+
## 📄 License
|
|
570
|
+
|
|
571
|
+
[Apache-2.0](LICENSE) © [PanonIT](https://panonit.com)
|
|
572
|
+
|
|
573
|
+
---
|
|
574
|
+
|
|
575
|
+
<div align="center">
|
|
576
|
+
|
|
577
|
+
**If sweet-search saves your agent's tokens, a ⭐ helps other agents' humans find it.**
|
|
578
|
+
|
|
579
|
+
</div>
|
|
@@ -1677,8 +1677,17 @@ function resolveSubMode(format) {
|
|
|
1677
1677
|
// space, and small-N entropy is dominated by the 1/log(n) denominator and
|
|
1678
1678
|
// stops being a reliable distribution-width signal.
|
|
1679
1679
|
|
|
1680
|
+
// Preview-tier budget: 3000 (was 4000 until 2026-06-11). The 4-model budget
|
|
1681
|
+
// sweep (DeepSeek/MiMo/GPT-5.5-codex/Opus-CC, 12 dev probes, paired vs 4k)
|
|
1682
|
+
// found 3k keeps every accuracy/usefulness metric flat-to-up with zero
|
|
1683
|
+
// call-compensation, and cuts realized cost −11–15% on the flagship cells.
|
|
1684
|
+
// Below 3k, flagship models re-buy the trimmed context with extra calls
|
|
1685
|
+
// (Opus calls Δ: 3k −0.08 → 2.8k +0.33 → 2.5k +0.67 → 2k +0.83), erasing
|
|
1686
|
+
// the savings — 3k is the floor. SWEET_SEARCH_PREVIEW_BUDGET overrides for
|
|
1687
|
+
// experiments; full/xl escalation tiers are unchanged.
|
|
1688
|
+
const PREVIEW_TIER_BUDGET = Number(process.env.SWEET_SEARCH_PREVIEW_BUDGET || '') || 3000;
|
|
1680
1689
|
const BUDGET_TIERS = {
|
|
1681
|
-
preview: { subMode: 'agent_preview', budget:
|
|
1690
|
+
preview: { subMode: 'agent_preview', budget: PREVIEW_TIER_BUDGET },
|
|
1682
1691
|
full: { subMode: 'agent_full', budget: 8000 },
|
|
1683
1692
|
xl: { subMode: 'agent_full_xl', budget: 12000 },
|
|
1684
1693
|
};
|
|
@@ -52,7 +52,7 @@ Options:
|
|
|
52
52
|
--fusion <type> Legacy: cc or rrf (ignored for hybrid - always uses robust CC fusion)
|
|
53
53
|
--late-interaction Enable late interaction reranking (if index available)
|
|
54
54
|
--late-interaction-model=ID Use specific model (lateon-code or lateon-code-edge)
|
|
55
|
-
--agent Agent mode: self-contained code blocks. Auto-picks
|
|
55
|
+
--agent Agent mode: self-contained code blocks. Auto-picks 3k/8k/12k
|
|
56
56
|
tier from score-distribution signals (top-1 dominance,
|
|
57
57
|
entropy, candidate-pool breadth) — no need to choose a tier.
|
|
58
58
|
--agent-preview Force the 4k preview tier (rarely needed; --agent auto-picks)
|
|
@@ -268,7 +268,7 @@ export async function generateRegexMatches(searcher, regex, searchDir, options =
|
|
|
268
268
|
const literalStart = performance.now();
|
|
269
269
|
|
|
270
270
|
const sparseForPrefilter = globs.length === 0 ? ensureSparseGramIndex(searcher, options) : null;
|
|
271
|
-
const prefilterFiles = sparseForPrefilter ?
|
|
271
|
+
const prefilterFiles = sparseForPrefilter ? getSparseGramAllFilesWithOverlay(searcher, sparseForPrefilter, options) : null;
|
|
272
272
|
|
|
273
273
|
if (prefilterFiles && prefilterFiles.length > 0) {
|
|
274
274
|
const combined = new Set();
|
|
@@ -83,7 +83,7 @@ Options:
|
|
|
83
83
|
--in <file> Disambiguate symbols by indexed file path
|
|
84
84
|
--query <hint> Natural-language hint used only for structural ranking
|
|
85
85
|
--depth <n> Impact depth, 1-4 (default: 3)
|
|
86
|
-
--budget <n> Token budget, 1000-16000 (default: adaptive
|
|
86
|
+
--budget <n> Token budget, 1000-16000 (default: adaptive 3k/8k/12k)
|
|
87
87
|
--json Output structured JSON
|
|
88
88
|
--format <fmt> plain (no banner) or json
|
|
89
89
|
--no-banner Suppress the identity line
|
|
@@ -126,6 +126,9 @@ async function cmdFind(args) {
|
|
|
126
126
|
process.stderr.write('Usage: ss-find "<query>" --regex "<regex>" [--full|--xl] [-k N]\n');
|
|
127
127
|
process.exit(2);
|
|
128
128
|
}
|
|
129
|
+
// Budget-sweep experiment hook: lets the bench pin the response token budget
|
|
130
|
+
// per-process without changing the agent-visible tool surface.
|
|
131
|
+
const envFindBudget = Number(process.env.SS_SMOKE_FIND_BUDGET || '') || null;
|
|
129
132
|
const effectiveRegex = regex || '';
|
|
130
133
|
const s = await getSweetSearch();
|
|
131
134
|
if (!s.hasLateInteractionIndex) {
|
|
@@ -136,6 +139,7 @@ async function cmdFind(args) {
|
|
|
136
139
|
regex: effectiveRegex || `\\b\\w+\\b`,
|
|
137
140
|
k,
|
|
138
141
|
format,
|
|
142
|
+
...(envFindBudget ? { tokenBudget: envFindBudget } : {}),
|
|
139
143
|
});
|
|
140
144
|
|
|
141
145
|
// Header (visible to agent)
|
|
@@ -212,7 +216,7 @@ async function cmdAgentSearch(args) {
|
|
|
212
216
|
// Main sweet-search auto/CatBoost search with token-budgeted agent packaging.
|
|
213
217
|
//
|
|
214
218
|
// Usage:
|
|
215
|
-
// ss-search "<query>" → format=agent (auto-pick
|
|
219
|
+
// ss-search "<query>" → format=agent (auto-pick 3k/8k/12k)
|
|
216
220
|
// ss-search "<query>" --full → force 8k (rarely needed; default auto-picks)
|
|
217
221
|
// ss-search "<query>" --xl → force 12k (rarely needed; default auto-picks)
|
|
218
222
|
// ss-search "<query>" -k 5 → top-K results
|
|
@@ -240,7 +244,10 @@ async function cmdAgentSearch(args) {
|
|
|
240
244
|
process.exit(1);
|
|
241
245
|
}
|
|
242
246
|
|
|
243
|
-
|
|
247
|
+
// Budget-sweep experiment hook: per-request explicit budget (overrides the
|
|
248
|
+
// auto-tier on the warm server; flows as the `budget` URL param).
|
|
249
|
+
const envSearchBudget = Number(process.env.SS_SMOKE_SEARCH_BUDGET || '') || null;
|
|
250
|
+
const response = await queryServer(query, { topK: k, mode, format, ...(envSearchBudget ? { tokenBudget: envSearchBudget } : {}) });
|
|
244
251
|
if (response?.error) {
|
|
245
252
|
process.stderr.write(`[ss-search] server error: ${response.error}\n`);
|
|
246
253
|
process.exit(1);
|
|
@@ -383,7 +390,11 @@ async function cmdSemantic(args) {
|
|
|
383
390
|
process.stderr.write('Usage: ss-semantic <file> "<question>" [--max-tokens N]\n');
|
|
384
391
|
process.exit(2);
|
|
385
392
|
}
|
|
386
|
-
|
|
393
|
+
// Default 600 (was 800) per the 2026-06 budget sweep — scaled with the 3k
|
|
394
|
+
// preview tier. Env hook overrides the default for sweeps; an explicit
|
|
395
|
+
// --max-tokens flag from the agent always wins.
|
|
396
|
+
const maxTokens = +parseFlag(args.slice(2), '--max-tokens',
|
|
397
|
+
Number(process.env.SS_SMOKE_SEMANTIC_MAXTOKENS || '') || 600);
|
|
387
398
|
const { readSemantic } = await import(path.join(REPO_ROOT, 'core/search/search-read-semantic.js'));
|
|
388
399
|
const r = await readSemantic({
|
|
389
400
|
path: file, query, projectRoot: PROJECT_ROOT,
|
|
@@ -423,7 +434,9 @@ async function cmdTrace(args) {
|
|
|
423
434
|
if (file) opts.filePath = file;
|
|
424
435
|
if (queryHint) opts.queryHint = queryHint;
|
|
425
436
|
if (depth != null) opts.maxDepth = +depth;
|
|
437
|
+
// Budget-sweep experiment hook: env sets the default; explicit --budget wins.
|
|
426
438
|
if (budget != null) opts.tokenBudget = +budget;
|
|
439
|
+
else if (Number(process.env.SS_SMOKE_TRACE_BUDGET || '') > 0) opts.tokenBudget = Number(process.env.SS_SMOKE_TRACE_BUDGET);
|
|
427
440
|
|
|
428
441
|
const response = traceSymbol(symbol, opts);
|
|
429
442
|
if (json) process.stdout.write(JSON.stringify(response, null, 2) + '\n');
|
|
@@ -4,7 +4,7 @@
|
|
|
4
4
|
# semantic / hybrid / structural) with auto-tier budget by default.
|
|
5
5
|
#
|
|
6
6
|
# Usage:
|
|
7
|
-
# ss-search "<query>" # auto-picks
|
|
7
|
+
# ss-search "<query>" # auto-picks 3k / 8k / 12k from signals
|
|
8
8
|
# ss-search "<query>" --full # force 8k (rarely needed; default auto-picks)
|
|
9
9
|
# ss-search "<query>" --xl # force 12k (rarely needed; default auto-picks)
|
|
10
10
|
# ss-search "<query>" -k N # top-K (default 5)
|
package/mcp/server.js
CHANGED
|
@@ -167,7 +167,7 @@ server.registerTool('trace', {
|
|
|
167
167
|
maxDepth: z.number().int().min(1).max(4).default(3).optional()
|
|
168
168
|
.describe('Maximum transitive impact depth (default: 3, capped at 4)'),
|
|
169
169
|
tokenBudget: z.number().int().min(1000).max(16000).optional()
|
|
170
|
-
.describe('Optional token budget. Omit for adaptive
|
|
170
|
+
.describe('Optional token budget. Omit for adaptive 3k/8k/12k selection.'),
|
|
171
171
|
},
|
|
172
172
|
outputSchema: TraceOutputSchema,
|
|
173
173
|
annotations: {
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "sweet-search",
|
|
3
|
-
"version": "2.5.
|
|
3
|
+
"version": "2.5.5",
|
|
4
4
|
"description": "Sweet Search - SOTA Hybrid Code Search Engine with WASM CatBoost Query Router, Semantic/Lexical/Structural Search, and Multilingual Support",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"main": "core/search/sweet-search.js",
|
|
@@ -143,7 +143,6 @@
|
|
|
143
143
|
"sharp": "^0.34.5",
|
|
144
144
|
"tree-sitter-wasms": "^0.1.13",
|
|
145
145
|
"undici": "^6.23.0",
|
|
146
|
-
"usearch": "^2.21.4",
|
|
147
146
|
"web-tree-sitter": "^0.25.10",
|
|
148
147
|
"zod": "^4.3.6"
|
|
149
148
|
},
|
|
@@ -157,12 +156,13 @@
|
|
|
157
156
|
"vitest": "^4.0.16"
|
|
158
157
|
},
|
|
159
158
|
"optionalDependencies": {
|
|
160
|
-
"
|
|
161
|
-
"@sweet-search/native-darwin-
|
|
162
|
-
"@sweet-search/native-
|
|
163
|
-
"@sweet-search/native-linux-arm64-gnu
|
|
164
|
-
"@sweet-search/native-linux-
|
|
165
|
-
"@sweet-search/native-linux-x64-gnu
|
|
159
|
+
"usearch": "^2.21.4",
|
|
160
|
+
"@sweet-search/native-darwin-arm64": "2.5.5",
|
|
161
|
+
"@sweet-search/native-darwin-x64": "2.5.5",
|
|
162
|
+
"@sweet-search/native-linux-arm64-gnu": "2.5.5",
|
|
163
|
+
"@sweet-search/native-linux-arm64-gnu-cuda": "2.5.5",
|
|
164
|
+
"@sweet-search/native-linux-x64-gnu": "2.5.5",
|
|
165
|
+
"@sweet-search/native-linux-x64-gnu-cuda": "2.5.5"
|
|
166
166
|
},
|
|
167
167
|
"engines": {
|
|
168
168
|
"node": ">=18.0.0"
|