sweet-search 2.5.5 → 2.5.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +284 -109
- package/assets/banner/banner-frames.webp +0 -0
- package/assets/banner/banner-manifest.json +10 -0
- package/core/banner/render-banner.js +209 -0
- package/core/banner/sixel.js +126 -0
- package/core/indexing/index-codebase-v21.js +5 -0
- package/core/indexing/indexer-ann.js +1 -1
- package/core/indexing/indexer-utils.js +49 -17
- package/core/infrastructure/simd-distance.js +11 -6
- package/package.json +16 -10
- package/scripts/init.js +7 -1
- package/scripts/postinstall-banner.js +46 -0
package/README.md
CHANGED
|
@@ -2,7 +2,7 @@
|
|
|
2
2
|
|
|
3
3
|
<img src="assets/sweet-search-banner-pixelated.svg" alt="sweet-search" width="100%" />
|
|
4
4
|
|
|
5
|
-
### *Maybe grep isn't all you need…*
|
|
5
|
+
### *Maybe grep isn't all you need…* 🍬
|
|
6
6
|
|
|
7
7
|
**A local-first hybrid code-search engine built for AI coding agents.**
|
|
8
8
|
Semantic + lexical + structural search over your working tree, GPU-accelerated local inference,
|
|
@@ -82,10 +82,7 @@ updates itself as you type.
|
|
|
82
82
|
<sub>reconcile daemon tracks your working tree</sub>
|
|
83
83
|
|
|
84
84
|
[🦀 The Native Engine Room](#-the-native-engine-room)<br>
|
|
85
|
-
<sub>four Rust crates +
|
|
86
|
-
|
|
87
|
-
[🎯 The Ranking Stack](#-the-ranking-stack)<br>
|
|
88
|
-
<sub>route → retrieve → fuse → rerank → expand</sub>
|
|
85
|
+
<sub>four Rust crates + INT4 LI compression</sub>
|
|
89
86
|
|
|
90
87
|
</td>
|
|
91
88
|
<td width="24%" valign="top">
|
|
@@ -93,7 +90,7 @@ updates itself as you type.
|
|
|
93
90
|
**THE RECEIPTS**
|
|
94
91
|
|
|
95
92
|
[📊 Benchmarks](#-benchmarks)<br>
|
|
96
|
-
<sub>full-corpus MRR
|
|
93
|
+
<sub>agent cost savings · engine speed · full-corpus MRR</sub>
|
|
97
94
|
|
|
98
95
|
[🙏 Prior Art & Acknowledgements](#-prior-art--acknowledgements)<br>
|
|
99
96
|
<sub>the shoulders we stand on</sub>
|
|
@@ -143,6 +140,94 @@ sweet-search uninstall # clean removal: models, caches, config —
|
|
|
143
140
|
|
|
144
141
|
## 📊 Benchmarks
|
|
145
142
|
|
|
143
|
+
We measure sweet-search four ways — from how much it helps a real agent down to raw engine throughput:
|
|
144
|
+
|
|
145
|
+
<table>
|
|
146
|
+
<tr>
|
|
147
|
+
<td width="25%" valign="top">
|
|
148
|
+
|
|
149
|
+
**① Code-retrieval** *(agent-in-the-loop)*<br>
|
|
150
|
+
<sub>Does it make a real coding agent **cheaper and more useful** when it searches your repo? Paired against each model's own grep-and-read loop.</sub>
|
|
151
|
+
|
|
152
|
+
</td>
|
|
153
|
+
<td width="25%" valign="top">
|
|
154
|
+
|
|
155
|
+
**② Task-completion** *(coming soon)*<br>
|
|
156
|
+
<sub>Does cheaper, denser context **compound** into a higher resolve-rate on multi-step engineering tasks? Harness in progress.</sub>
|
|
157
|
+
|
|
158
|
+
</td>
|
|
159
|
+
<td width="25%" valign="top">
|
|
160
|
+
|
|
161
|
+
**③ Paper-type IR** *(academic)*<br>
|
|
162
|
+
<sub>The standard NL→code retrieval suites (GCSN, M2CRB, CoSQA…), full-corpus MRR@10.</sub>
|
|
163
|
+
|
|
164
|
+
</td>
|
|
165
|
+
<td width="25%" valign="top">
|
|
166
|
+
|
|
167
|
+
**④ Engine speed**<br>
|
|
168
|
+
<sub>Raw systems numbers — grep throughput, query latency, rerank kernels, HNSW.</sub>
|
|
169
|
+
|
|
170
|
+
</td>
|
|
171
|
+
</tr>
|
|
172
|
+
</table>
|
|
173
|
+
|
|
174
|
+
---
|
|
175
|
+
|
|
176
|
+
### 🤖 1. Code-retrieval benchmarks — *the agent-in-the-loop test*
|
|
177
|
+
|
|
178
|
+
We install the evolved agent prompt (the [GEPA-evolved search discipline](#-an-agent-prompt-that-was-evolved-not-written)), point a coding agent at a real repo, and pair it **probe-for-probe against the same model running its own native grep-and-read loop**. Same model, same tasks, same judge — the only difference is whether sweet-search is wired in.
|
|
179
|
+
|
|
180
|
+
<div align="center">
|
|
181
|
+
|
|
182
|
+
<img src="assets/code-retrieval-stats.svg" alt="up to 34% lower cost on Codex · up to 56% fewer tool calls · 1.5–2× more useful context per response · +3pp accuracy on weak models" width="100%" />
|
|
183
|
+
|
|
184
|
+
<sub>top-of-range figures · full per-harness ranges in the dropdown · 11 model×harness cells, paired, multiplicity-controlled</sub>
|
|
185
|
+
|
|
186
|
+
</div>
|
|
187
|
+
|
|
188
|
+
**The headline, in four claims:**
|
|
189
|
+
|
|
190
|
+
- 💰 **Cheaper where the agent thrashes** — up to **−34%** realized cost on Codex; **−18 to −32%** across the GPT-5.5 / opencode / bare-API harnesses.
|
|
191
|
+
- 🔧 **Fewer round-trips** — up to **−56%** tool calls, significant on **9 of 11** cells.
|
|
192
|
+
- ✨ **More useful per response** — **+0.18 to +0.31** on a 5-dimension usefulness score, and *still* denser when length-matched (significant on **8 of 11** cells).
|
|
193
|
+
- 🎯 **Accuracy held — and lifted on the weak** — a statistical tie on flagship models (saturated at 0.94–0.99), and **+3 pp** (up to **+8 pp** out-of-distribution) on weaker models like GLM-5.1 and DeepSeek.
|
|
194
|
+
|
|
195
|
+
<details>
|
|
196
|
+
<summary><b>📋 Full per-harness results & how it's measured</b></summary>
|
|
197
|
+
|
|
198
|
+
The win is **harness-adaptive**: where the native loop is disciplined (Claude Code) it shows up as *denser, more useful context per token*; where it thrashes (Codex floods 30k+ tokens of its own grep output into context) it shows up as a *large cost and tool-call cut*. Either way, **final-answer accuracy never significantly regresses**.
|
|
199
|
+
|
|
200
|
+
| 🧰 Native agent harness | 💰 Realized cost | 🔧 Tool calls | ✨ Useful content / response | 🎯 Final accuracy |
|
|
201
|
+
|---|---:|---:|---:|:--|
|
|
202
|
+
| 🤖 **Codex** (GPT-5.5) | **−30 to −34%** | **−44 to −56%** | +0.06 → +0.17 ↑ | tie *(saturated)* |
|
|
203
|
+
| 🐚 **opencode** (GPT-5.5 / GLM-5.1) | **−18 to −22%** | −15 to −49% | **+0.23 to +0.31** ↑ | tie |
|
|
204
|
+
| 🔌 **bare API** (GPT-5.5 / GLM / DeepSeek) | −15 to −32% ᵃ | −15 to −33% | +0.08 to +0.24 ↑ | tie · **+3 pp on weak models** |
|
|
205
|
+
| 🟣 **Claude Code** (Sonnet / Opus) | −10% to +14% ᵇ | −5 to −33% | +0.18 to +0.29 ↑ | tie |
|
|
206
|
+
|
|
207
|
+
<sub>↑ "Useful content / response" is the per-response delta on a 5-dimension usefulness score (answer-grounding · workable-code · navigability · edit-locality · sufficiency), 0–1 scale. "tie" = final-answer correctness statistically indistinguishable (saturated in the 0.94–0.99 band on flagships).<br>ᵃ the two cheapest bare models cost fractions of a cent either way (GLM +27% of $0.008; DeepSeek −15% of $0.004). ᵇ Opus −5/−10%; Sonnet +8–14%, which is ≈1¢ on a flat-rate subscription for a richer answer.</sub>
|
|
208
|
+
|
|
209
|
+
**Denser, not just longer.** The usefulness lift survives **length-matching** — comparing sweet-search and native responses of *equal token length*, sweet-search's content is significantly higher on **8 of 11** cells. The validated single-number usefulness composite (grounding × content × density) is significant on **all 11** sealed cells.
|
|
210
|
+
|
|
211
|
+
- **What's being compared:** the installed `sweet-search` agent prompt + tools vs. the *same model* using only its built-in file-reading and shell-grep tools. Not a different model — the same model, with and without sweet-search.
|
|
212
|
+
- **Design:** 11 model×harness cells. **Sealed vault** (n=60/arm, the pre-registered primary) opened once; plus **held-out** (n=30) and **out-of-distribution** (n=40) sets for generalization. Stratified, fixed-seed splits.
|
|
213
|
+
- **Judging:** 3-judge panel (DeepSeek-V4-flash + Gemini-3.1-flash-lite + MiniMax-M2.7), paired by probe, 20k-sample bootstrap CIs, **Benjamini–Hochberg FDR** multiplicity correction across each metric family. We report family-level survival counts, never a single cherry-picked cell.
|
|
214
|
+
- **What survives FDR (vault):** useful-content **10/11**, density-composite **11/11**, length-matched content **8/11**, fewer-tool-calls **9/11**. Generalization (held-out + OOD): content **17–18/20**, fewer calls **14/20**.
|
|
215
|
+
- **The token fact that drives everything:** sweet-search's footprint is nearly constant (~1.3k–3.3k tokens) because the tool responses are capped; native's footprint is whatever the model decides to grep — up to **37k tokens** on Codex. That single fact is what drives the cost and tool-call gaps.
|
|
216
|
+
- **Honest caveats we keep attached:** (1) accuracy **ties** on flagship models — it is *not* an accuracy win there, it's saturated; the accuracy gains are real only on weaker models. (2) The two weakest cells for *length-matched* density (Codex-low, DeepSeek) are correct-sign but underpowered — Codex's responses are so token-divergent that too few equal-length pairs exist to reach significance, and DeepSeek is simply under-powered. Those are honest non-victories, not wins.
|
|
217
|
+
- Full methodology and per-cell tables: [`docs/PHASE7.md`](docs/PHASE7.md).
|
|
218
|
+
|
|
219
|
+
</details>
|
|
220
|
+
|
|
221
|
+
---
|
|
222
|
+
|
|
223
|
+
### 🚧 2. Task-completion benchmarks — *coming soon*
|
|
224
|
+
|
|
225
|
+
> Retrieval quality is necessary but not sufficient. Cheaper, denser context only matters if it **compounds across a real, multi-step engineering task** — finding the code, understanding it, changing it, and not breaking anything. The next suite measures exactly that: **resolve-rate on SWE-bench-style multi-file tasks**, sweet-search-wired vs. native, on the same paired, multiplicity-controlled bar as above. Harness and pilot are in progress — numbers land here when they clear that bar, and not before.
|
|
226
|
+
|
|
227
|
+
---
|
|
228
|
+
|
|
229
|
+
### 📄 3. Paper-type retrieval benchmarks — *academic NL→code IR*
|
|
230
|
+
|
|
146
231
|
> [!WARNING]
|
|
147
232
|
> ⚠️ **THESE NUMBERS ARE STALE — TREAT THEM AS A FLOOR, NOT THE CURRENT SCORE.** ⚠️
|
|
148
233
|
> Several results below were measured on builds that predate major accuracy work
|
|
@@ -153,15 +238,15 @@ sweet-search uninstall # clean removal: models, caches, config —
|
|
|
153
238
|
Every number below is the **`ss-search` pipeline end-to-end** — the same binary you install, querying
|
|
154
239
|
against the **full corpus** (no 99-distractor shortcuts), measured at 26–41 ms p50 on an M3 Max.
|
|
155
240
|
|
|
156
|
-
| Benchmark | What it tests | Queries | MRR@10 |
|
|
241
|
+
| 📚 Benchmark | 🔍 What it tests | # Queries | 🎯 MRR@10 |
|
|
157
242
|
|-----------|---------------|--------:|-------:|
|
|
158
|
-
| **GenCodeSearchNet** | NL→code, 6 languages | 6,000 | **86.6** |
|
|
159
|
-
| **M2CRB** | multilingual NL→code (ES/PT/DE/FR → Py/Java/JS) | 2,814 | **60.2** |
|
|
160
|
-
| CoSQA (test split) | web queries → Python | 500 | 97.0 |
|
|
161
|
-
| CoSQA+ | web queries → Python, multi-match | 20,604 | 72.1 |
|
|
162
|
-
| CLARC | NL→C/C++ (systems code) | 1,245 | 67.4 |
|
|
163
|
-
| AdvTest † | adversarially renamed Python | 1,000 | 91.5 |
|
|
164
|
-
| CoIR † | 10 datasets, 14 languages | 4,500 | 57.3 |
|
|
243
|
+
| 🌐 **GenCodeSearchNet** | NL→code, 6 languages | 6,000 | **86.6** |
|
|
244
|
+
| 🗺️ **M2CRB** | multilingual NL→code (ES/PT/DE/FR → Py/Java/JS) | 2,814 | **60.2** |
|
|
245
|
+
| 🐍 CoSQA (test split) | web queries → Python | 500 | 97.0 |
|
|
246
|
+
| 🐍 CoSQA+ | web queries → Python, multi-match | 20,604 | 72.1 |
|
|
247
|
+
| ⚙️ CLARC | NL→C/C++ (systems code) | 1,245 | 67.4 |
|
|
248
|
+
| 🛡️ AdvTest † | adversarially renamed Python | 1,000 | 91.5 |
|
|
249
|
+
| 🌍 CoIR † | 10 datasets, 14 languages | 4,500 | 57.3 |
|
|
165
250
|
|
|
166
251
|
**GenCodeSearchNet: the strongest result published anywhere, as far as we can tell.** The benchmark's
|
|
167
252
|
own paper tops out at MRR ≤ 0.42 for its fine-tuned baselines (and ≤ 0.10 on the cross-lingual subsets),
|
|
@@ -174,7 +259,7 @@ pools). sweet-search reaches **60.2 full-corpus MRR@10 out of the box**, on Span
|
|
|
174
259
|
and French queries.
|
|
175
260
|
|
|
176
261
|
<details>
|
|
177
|
-
<summary><b>Methodology
|
|
262
|
+
<summary><b>Methodology & staleness flags</b></summary>
|
|
178
263
|
|
|
179
264
|
- **Reproduction:** result artifacts live in `eval/results/`; rerun via `eval/run_all.js`.
|
|
180
265
|
- **Protocol note:** published baselines for GCSN and CoSQA-style benchmarks typically rank the gold snippet against 99 sampled distractors. All sweet-search numbers rank against the full benchmark corpus — strictly harder.
|
|
@@ -182,18 +267,26 @@ and French queries.
|
|
|
182
267
|
- **Honesty corner:** CrossCodeEval — cross-file *completion context* retrieval, a different task than NL search — sits at 0.12. We don't optimize for it and report it anyway.
|
|
183
268
|
- Dates and per-language breakdowns: [`docs/BENCHMARKS_EXPLAINED.md`](docs/BENCHMARKS_EXPLAINED.md).
|
|
184
269
|
|
|
185
|
-
|
|
270
|
+
</details>
|
|
186
271
|
|
|
187
|
-
|
|
188
|
-
|------|--------|--------|
|
|
189
|
-
| Indexed grep vs ripgrep | **10.2× faster** at the median (8.5–17.7× across 5 repos, 353 realistic queries, 1 ms p50 — identical match counts on every query) | [`docs/GREP_INDEXING_STRATEGY.md`](docs/GREP_INDEXING_STRATEGY.md) |
|
|
190
|
-
| Warm query latency (native CLI) | **2.9 ms** warm · 108 ms cold | [`docs/INIT_STRATEGY.md`](docs/INIT_STRATEGY.md) |
|
|
191
|
-
| MaxSim rerank kernels | **1.26 s → 27 ms** for a 231-candidate pass (47× native Rust; 16× WASM SIMD) | [`docs/MAXSIM_OPTIMIZATION.md`](docs/MAXSIM_OPTIMIZATION.md) |
|
|
192
|
-
| HNSW tuning for code | **−33%** search p50, **+5.9 pp** recall@200 | [`docs/HNSW_APPROACH.md`](docs/HNSW_APPROACH.md) |
|
|
193
|
-
| Indexing memory | peak JS heap **785 MB → 213 MB** | [`docs/DISK_FLUSHING_STRATEGY.md`](docs/DISK_FLUSHING_STRATEGY.md) |
|
|
194
|
-
| CoreML cascade (M3 Max) | **18% faster** full indexing vs the Metal baseline | [`docs/INIT_STRATEGY.md`](docs/INIT_STRATEGY.md) |
|
|
272
|
+
---
|
|
195
273
|
|
|
196
|
-
|
|
274
|
+
### ⚡ 4. Engine speed — *systems benchmarks, measured in-repo*
|
|
275
|
+
|
|
276
|
+
<div align="center">
|
|
277
|
+
|
|
278
|
+
**10.2×** ripgrep's median grep · **2.9 ms** warm queries · **47×** MaxSim kernels · **−33%** HNSW search p50
|
|
279
|
+
|
|
280
|
+
</div>
|
|
281
|
+
|
|
282
|
+
| ⚙️ What | 📈 Result | 📄 Source |
|
|
283
|
+
|------|--------|--------|
|
|
284
|
+
| ⚡ Indexed grep vs ripgrep | **10.2× faster** at the median (8.5–17.7× across 5 repos, 353 realistic queries, 1 ms p50 — identical match counts on every query) | [`docs/GREP_INDEXING_STRATEGY.md`](docs/GREP_INDEXING_STRATEGY.md) |
|
|
285
|
+
| ⏱️ Warm query latency (native CLI) | **2.9 ms** warm · 108 ms cold | [`docs/INIT_STRATEGY.md`](docs/INIT_STRATEGY.md) |
|
|
286
|
+
| 🧮 MaxSim rerank kernels | **1.26 s → 27 ms** for a 231-candidate pass (47× native Rust; 16× WASM SIMD) | [`docs/MAXSIM_OPTIMIZATION.md`](docs/MAXSIM_OPTIMIZATION.md) |
|
|
287
|
+
| 🧠 HNSW tuning for code | **−33%** search p50, **+5.9 pp** recall@200 | [`docs/HNSW_APPROACH.md`](docs/HNSW_APPROACH.md) |
|
|
288
|
+
| 💾 Indexing memory | peak JS heap **785 MB → 213 MB** | [`docs/DISK_FLUSHING_STRATEGY.md`](docs/DISK_FLUSHING_STRATEGY.md) |
|
|
289
|
+
| 🍏 CoreML cascade (M3 Max) | **18% faster** full indexing vs the Metal baseline | [`docs/INIT_STRATEGY.md`](docs/INIT_STRATEGY.md) |
|
|
197
290
|
|
|
198
291
|
## 🧰 The Six Tools
|
|
199
292
|
|
|
@@ -202,43 +295,115 @@ to be *consumed by an agent* — a useful answer, not a wall of matches to scrol
|
|
|
202
295
|
|
|
203
296
|
| Tool | What you give it | What you get back |
|
|
204
297
|
|------|------------------|-------------------|
|
|
205
|
-
| `ss-search` | a natural-language query | ranked, **self-contained code blocks** |
|
|
206
|
-
| `ss-grep` | an exact regex/literal | `file:line`
|
|
207
|
-
| `ss-find` | a regex **+** a query | regex matches, **semantically re-ranked, as code blocks** |
|
|
208
|
-
| `ss-semantic` | a file **+** a question | just the **relevant spans** of that file |
|
|
209
|
-
| `ss-trace` | a symbol | **callers + callees + impact**, in one call |
|
|
210
|
-
| `ss-read` | a file (± line range) | exact bytes **+ symbol metadata** |
|
|
298
|
+
| 1. [`ss-search`](#tool-ss-search) | a natural-language query | ranked, **self-contained code blocks** |
|
|
299
|
+
| 2. [`ss-grep`](#tool-ss-grep) | an exact regex/literal | every `file:line` hit, **ripgrep-identical** |
|
|
300
|
+
| 3. [`ss-find`](#tool-ss-find) | a regex **+** a query | regex matches, **semantically re-ranked, as code blocks** |
|
|
301
|
+
| 4. [`ss-semantic`](#tool-ss-semantic) | a file **+** a question | just the **relevant spans** of that file |
|
|
302
|
+
| 5. [`ss-trace`](#tool-ss-trace) | a symbol | **callers + callees + impact**, in one call |
|
|
303
|
+
| 6. [`ss-read`](#tool-ss-read) | a file (± line range) | exact bytes **+ symbol metadata** |
|
|
211
304
|
|
|
212
|
-
|
|
305
|
+
---
|
|
213
306
|
|
|
214
|
-
|
|
215
|
-
ss-search
|
|
307
|
+
<a id="tool-ss-search"></a>
|
|
308
|
+
### 1. 🔍 `ss-search` — hybrid search powerhouse
|
|
309
|
+
|
|
310
|
+
A hybrid search pipeline with late interaction reranking that returns actual code blocks.
|
|
311
|
+
|
|
312
|
+
SOTA in several published [`benchmarks`](#-benchmarks).
|
|
313
|
+
|
|
314
|
+
```mermaid
|
|
315
|
+
flowchart TD
|
|
316
|
+
Q(["🔍 natural-language query"]) --> ROUTE{{"🧭 WASM CatBoost router · lexical / hybrid"}}
|
|
317
|
+
|
|
318
|
+
ROUTE --> BM["📑 <b>BM25F</b><br/>field-weighted FTS5"]
|
|
319
|
+
ROUTE --> ANN
|
|
320
|
+
|
|
321
|
+
subgraph ANN ["🧬 three-stage ANN cascade"]
|
|
322
|
+
direction LR
|
|
323
|
+
BIN["binary <b>HNSW</b><br/>Hamming · ~100µs"] --> INT["INT8<br/>rescore"] --> FL["float32<br/>mmap sidecar"]
|
|
324
|
+
end
|
|
325
|
+
|
|
326
|
+
BM --> FUSE
|
|
327
|
+
ANN --> FUSE
|
|
328
|
+
FUSE["🔀 <b>CCFusion</b><br/>convex combo · RRF fallback"] --> ROW1
|
|
329
|
+
|
|
330
|
+
subgraph ROW1 [" "]
|
|
331
|
+
direction LR
|
|
332
|
+
IAR["⚓ <b>IAR</b><br/>exact-symbol injection"] --> INTENT["🎯 intent rerank<br/>demote docs · tests · config"]
|
|
333
|
+
end
|
|
334
|
+
|
|
335
|
+
ROW1 --> ROW2
|
|
336
|
+
|
|
337
|
+
subgraph ROW2 [" "]
|
|
338
|
+
direction LR
|
|
339
|
+
GRAPH["🕸️ graph expansion<br/>typed edges · 1–2 hops · <b>PathRAG</b>"] --> MAXSIM["🧮 <b>Late-Interaction Rerank</b><br/>⚡ native Rust MaxSim kernel"] --> OUT(["🏁 <b>self-contained code blocks</b><br/>whole functions · 3k/8k/12k budget"])
|
|
340
|
+
end
|
|
341
|
+
|
|
342
|
+
classDef io fill:#fde68a,stroke:#f59e0b,color:#000;
|
|
343
|
+
classDef out fill:#bbf7d0,stroke:#15803d,color:#000,stroke-width:3px;
|
|
344
|
+
classDef route fill:#e0e7ff,stroke:#818cf8,color:#000;
|
|
345
|
+
classDef lex fill:#dbeafe,stroke:#60a5fa,color:#000;
|
|
346
|
+
classDef fuse fill:#f3e8ff,stroke:#c084fc,color:#000;
|
|
347
|
+
classDef rank fill:#ffe4e6,stroke:#fb7185,color:#000;
|
|
348
|
+
|
|
349
|
+
class Q io;
|
|
350
|
+
class OUT out;
|
|
351
|
+
class ROUTE route;
|
|
352
|
+
class BM,BIN,INT,FL lex;
|
|
353
|
+
class FUSE,IAR fuse;
|
|
354
|
+
class INTENT,GRAPH,MAXSIM rank;
|
|
355
|
+
|
|
356
|
+
style ANN fill:#eff6ff,stroke:#93c5fd,color:#000;
|
|
357
|
+
style ROW1 fill:none,stroke:none;
|
|
358
|
+
style ROW2 fill:none,stroke:none;
|
|
216
359
|
```
|
|
217
360
|
|
|
218
|
-
|
|
361
|
+
<sub>↑ The diagram traces the **hybrid** route. A pure-lexical query — or a literal file path — short-circuits at the router straight to BM25F, skipping the vector cascade and fusion.</sub>
|
|
219
362
|
|
|
220
|
-
|
|
221
|
-
|
|
222
|
-
|
|
223
|
-
|
|
224
|
-
|
|
225
|
-
|
|
226
|
-
|
|
227
|
-
|
|
363
|
+
| Stage | What it actually does |
|
|
364
|
+
|-------|-----------------------|
|
|
365
|
+
| 🧭 **Route** | **WASM-exported CatBoost** · lexical / hybrid · **~10 µs** routing · low-confidence → max-recall hybrid |
|
|
366
|
+
| 🧬 **Retrieve** | • **Lexical** — **BM25F** over field-weighted FTS5 (name 10× · signature 5× · alias 4× · doc 1×)<br/>• **Embed** — query vectorized by the local **CodeRankEmbed** model (swappable for Voyage / Jina / Codestral)<br/>• **Vector cascade** — binary **HNSW** (Hamming, 64-byte, ~100 µs) → INT8 rescore → exact float32 from a memory-mapped sidecar |
|
|
367
|
+
| 🔀 **Fuse** | • **CCFusion** — convex-combine both rankings · per-route weights · quantile-normalized<br/>• **MMR** (λ=0.9) diversity pass over the fused list<br/>• auto **RRF** (k=60) fallback on degenerate score distributions |
|
|
368
|
+
| ⚓ **Anchor** | • **IAR** (Identifier Anchor Retrieval) — a real symbol in the query fires an exact-name code-graph lookup that injects that entity, even when the encoder ranked it too low |
|
|
369
|
+
| 🎯 **Intent Rerank** | • demote docs / tests / config when you want implementation<br/>• log-scaled call-site boosts surface the most-referenced function |
|
|
370
|
+
| 🕸️ **Graph Expansion** | • typed-edge walks (`imports`/`extends`/`calls`/`uses`) · adaptive 2-hop on the AST graph · edges picked by intent<br/>• **PathRAG** flow pruning + degree normalization → hubs can't dominate |
|
|
371
|
+
| 🧮 **Late interaction Rerank** | • Query embedded per-token by **LateOn-Code** (149M; a 17M **edge** variant auto-selected on low-RAM hosts)<br/>• **MaxSim** against the pre-indexed quantized token vectors<br/>• native Rust+Rayon MaxSim kernel ⚡ · WASM-SIMD fallback (1.26 s → 27 ms on a 231-candidate rerank) |
|
|
372
|
+
| 📦 **Package** | • entity-aware expansion → whole functions (imports, docstrings, decorators)<br/>• same-file overlap demotion → diverse, non-overlapping spans<br/>• auto-selected **3k / 8k / 12k** token budget |
|
|
228
373
|
|
|
229
374
|
<details>
|
|
230
|
-
<summary><b
|
|
375
|
+
<summary><b>🌶️ Extra spice — the bits that didn't fit the diagram</b></summary>
|
|
376
|
+
|
|
377
|
+
**🧠 The HNSW, in full** ([full writeup](docs/HNSW_APPROACH.md)). Stage 1 is a from-scratch binary HNSW, and every "advanced" trick ships **on by default**:
|
|
378
|
+
- **Heuristic neighbor selection** (HNSW Algorithm 4) + **M0 = 2M** on layer 0 — a real graph backbone, not naïve closest-M
|
|
379
|
+
- **Shuffled insertion order** — no filesystem-ordering bias baked into the highway structure
|
|
380
|
+
- **Discovery-rate adaptive early termination** + **adaptive ef** — easy queries stop early, hard ones keep their budget
|
|
381
|
+
- A **denser graph than most vendors ship** (M=64 · efC=800 · efS=400) — which broke an 80.6 % → 86.5 % recall@200 plateau and cut p50 latency ~33 %
|
|
382
|
+
- **Zero-GC search**: typed-array heaps + generation-stamped visited lists — no per-query allocation
|
|
383
|
+
- 64-byte sign-bit vectors (Hamming) → INT8 → exact float32 from a memory-mapped sidecar
|
|
384
|
+
|
|
385
|
+
**⚡ Why it's quick.** A native Rust + Rayon **MaxSim kernel** (47× over scalar; 16× WASM-SIMD fallback) · int4-quantized, binary-packed token vectors (plain INT4 is the shipped path — the full [TurboQuant](docs/LI_QUANTIZATION_STRATEGY.md) algorithm is researched but deferred; binary packing alone cut the LI index ~3.4×, 1.34 GiB → ~396 MiB) · a memory-mapped float32 sidecar that skips SQL on the rescore hot path · **score-spread adaptive pooling** (decisive queries shrink the rescore pool, ambiguous ones widen it) · and a warm daemon that answers in a single NAPI call — no process is ever forked.
|
|
231
386
|
|
|
232
|
-
|
|
233
|
-
-
|
|
387
|
+
**🎛️ Priors & structure.**
|
|
388
|
+
- **Quality priors:** every chunk carries a 0–1 prior from test proximity, git recency, symbol centrality (PageRank), comment density, and complexity — production code surfaces, stale fixtures sink.
|
|
389
|
+
- **Community structure:** a canonical **Leiden** pass detects code communities on the entity graph at index time, feeding vocabulary prewarming and structural signals — it understands your modules, not just your directories.
|
|
390
|
+
- **Multilingual:** 14 languages get full tree-sitter AST treatment; a 39-config registry covers 70+ extensions beyond that. Router features handle camelCase/snake_case, CJK density, and German compounds.
|
|
391
|
+
- **Format-gated signals:** structure-aware boosts and demotions (symbol-exact, path-token, mega-entity) fire only in agent mode — they help agent-shaped queries and would hurt plain NL, so they stay gated by default.
|
|
392
|
+
|
|
393
|
+
**🛟 Rescues & honest trade-offs.**
|
|
394
|
+
- **Long-query rescue:** wordy NL queries that FTS5 would tokenize into an unsatisfiable `AND` fall back to multi-query BM25F + RRF — one query per content keyword, fused.
|
|
395
|
+
- **Near-duplicate dedup:** a SimHash + MinHash-LSH pass (Jaccard τ=0.9) clusters copy-paste and vendored code at index time; aliases reuse their exemplar's vectors and skip *both* the bi-encoder and late-interaction encoding.
|
|
396
|
+
- **A negative result we ship anyway:** we built a full cross-encoder rerank cascade behind an adaptive confidence gate, measured it on our eval sets — and it didn't beat MaxSim at 3× the latency. So it ships **disabled** (`SWEET_SEARCH_CASCADE_ENABLED=true` to try it). We'd rather ship the faster path than a fancier diagram.
|
|
397
|
+
- **Budget tiers:** the expensive 8k/12k tiers fire on ~1–5 % of queries — the default stays cheap. Force one with `--full` / `--xl`, or pick a mode with `--mode lexical|semantic|hybrid|pattern`.
|
|
398
|
+
|
|
399
|
+
Also available as `sweet-search "<query>"` on the CLI and the `search` MCP tool.
|
|
234
400
|
|
|
235
401
|
</details>
|
|
236
402
|
|
|
237
|
-
|
|
403
|
+
---
|
|
238
404
|
|
|
239
|
-
|
|
240
|
-
ss-grep
|
|
241
|
-
```
|
|
405
|
+
<a id="tool-ss-grep"></a>
|
|
406
|
+
### 2. ⚡ `ss-grep` — grep, minus every wasted millisecond
|
|
242
407
|
|
|
243
408
|
**10.2× faster than ripgrep end-to-end at the median** — measured across **353 realistic queries on 5 real repos**
|
|
244
409
|
(range 8.5–17.7× per repo, 1 ms p50), with **identical match counts on every single query**. Three things buy that:
|
|
@@ -247,7 +412,7 @@ ss-grep "parseRetryAfter" -k 10
|
|
|
247
412
|
- **Regex-AST literal extraction + SIMD intersection**: required substrings are pulled from the pattern's syntax tree, posting lists are intersected with NEON/SSE2 block merges (galloping search for skewed sizes), and only the files that *can* match — typically 0.1–5% of the corpus — see the real regex.
|
|
248
413
|
- **Fully in-process**: verification runs on Rust's regex crate with Rayon across all cores, inside the warm daemon, in a single NAPI call. No child process is ever spawned — zero fork/exec, zero pipe I/O, zero JSON re-parsing.
|
|
249
414
|
|
|
250
|
-
|
|
415
|
+
Every match comes back in stable `file:line` order — ripgrep-identical counts, optional context lines — with no relevance guessing, no subprocess, in one warm call.
|
|
251
416
|
|
|
252
417
|
<details>
|
|
253
418
|
<summary><b>More</b></summary>
|
|
@@ -257,7 +422,10 @@ Hits come back **ranked and scored**, so an agent can trust the top one and stop
|
|
|
257
422
|
|
|
258
423
|
</details>
|
|
259
424
|
|
|
260
|
-
|
|
425
|
+
---
|
|
426
|
+
|
|
427
|
+
<a id="tool-ss-find"></a>
|
|
428
|
+
### 3. `ss-find` — ColGrep, on a faster engine
|
|
261
429
|
|
|
262
430
|
```bash
|
|
263
431
|
ss-find "token refresh logic" --regex "refresh.*[Tt]oken"
|
|
@@ -280,15 +448,18 @@ semantically ranked — but rebuilt on our own substrate:
|
|
|
280
448
|
|
|
281
449
|
</details>
|
|
282
450
|
|
|
283
|
-
|
|
451
|
+
---
|
|
452
|
+
|
|
453
|
+
<a id="tool-ss-semantic"></a>
|
|
454
|
+
### 4. `ss-semantic` — hybrid retrieval, scoped to one file
|
|
284
455
|
|
|
285
456
|
```bash
|
|
286
457
|
ss-semantic src/auth/session.ts "where does the cookie get its expiry?"
|
|
287
458
|
```
|
|
288
459
|
|
|
289
460
|
You know the file; this finds the lines. Every indexed chunk of the file is scored by **three independent
|
|
290
|
-
signals** — BM25-style lexical term match, exact symbol-name match (weighted 1.5×), and
|
|
291
|
-
MaxSim over
|
|
461
|
+
signals** — BM25-style lexical term match, exact symbol-name match (weighted 1.5×), and per-token
|
|
462
|
+
**MaxSim** late interaction over the **LateOn-Code** embeddings — fused with **Reciprocal Rank Fusion** (k=60), with
|
|
292
463
|
symbol-less fragment chunks demoted 0.85× so real definitions win ties. The top spans are then
|
|
293
464
|
**re-read from disk** (±2 context lines, overlapping spans merged), so the answer is filesystem ground
|
|
294
465
|
truth even mid-edit; if the file is newer than its index entry you get an explicit staleness warning.
|
|
@@ -303,7 +474,10 @@ The useful answer: just the relevant spans with line numbers — not the whole f
|
|
|
303
474
|
|
|
304
475
|
</details>
|
|
305
476
|
|
|
306
|
-
|
|
477
|
+
---
|
|
478
|
+
|
|
479
|
+
<a id="tool-ss-trace"></a>
|
|
480
|
+
### 5. `ss-trace` — graph algorithms, not grep guesswork
|
|
307
481
|
|
|
308
482
|
```bash
|
|
309
483
|
ss-trace processOrder --in src/orders/service.py
|
|
@@ -330,7 +504,10 @@ bounds impact traversal (1–4).
|
|
|
330
504
|
|
|
331
505
|
</details>
|
|
332
506
|
|
|
333
|
-
|
|
507
|
+
---
|
|
508
|
+
|
|
509
|
+
<a id="tool-ss-read"></a>
|
|
510
|
+
### 6. `ss-read` — exact bytes, with the index's knowledge attached
|
|
334
511
|
|
|
335
512
|
```bash
|
|
336
513
|
ss-read src/db/pool.js 120 180
|
|
@@ -353,6 +530,8 @@ without another search.
|
|
|
353
530
|
> capability is equally available as `sweet-search` CLI subcommands and as MCP tools — see
|
|
354
531
|
> [Works With Your Agent](#-works-with-your-agent).
|
|
355
532
|
|
|
533
|
+
---
|
|
534
|
+
|
|
356
535
|
## 🧠 An Agent Prompt That Was Evolved, Not Written
|
|
357
536
|
|
|
358
537
|
Giving an agent six tools is easy. Getting it to *stop grepping in circles* is not.
|
|
@@ -385,36 +564,55 @@ What it teaches:
|
|
|
385
564
|
|
|
386
565
|
## ⚡ GPU-Accelerated Indexing, Fully Local
|
|
387
566
|
|
|
388
|
-
|
|
389
|
-
|
|
390
|
-
|
|
567
|
+
> **Chunk → enrich → embed → quantize** — every step on-device and in Rust. Batches are sized to *your CPU's actual cache*, two open code-models do the encoding, and two separate quantizations make the index both **faster to build** and **small enough to live in RAM**. Zero API keys; nothing ever leaves the machine.
|
|
568
|
+
|
|
569
|
+
| ① Structure-aware chunk | ② Enrich from structure | ③ Embed — two models | ④ Quantize + persist |
|
|
570
|
+
|:--|:--|:--|:--|
|
|
571
|
+
| cAST over tree-sitter ASTs — whole functions, never sliced mid-body | deterministic preamble from the code graph — **no LLM call** | dense **CodeRankEmbed** + per-token **LateOn-Code** | INT8 weights → **2× faster build** · INT4 vectors → **fits in RAM** |
|
|
572
|
+
|
|
573
|
+
**The inference engine, picked for your silicon:**
|
|
391
574
|
|
|
392
575
|
| Your hardware | What runs |
|
|
393
|
-
|
|
394
|
-
| Apple Silicon (M1+) | candle **Metal**, BF16, fused SDPA attention |
|
|
395
|
-
| Apple Silicon (M3+) |
|
|
396
|
-
| NVIDIA GPU (SM 7.0+) | candle **CUDA**; **flash-attention** on Ampere+ |
|
|
397
|
-
|
|
|
398
|
-
|
|
399
|
-
|
|
400
|
-
structure-aware chunking over real **tree-sitter ASTs
|
|
401
|
-
|
|
402
|
-
|
|
403
|
-
|
|
404
|
-
|
|
405
|
-
|
|
406
|
-
|
|
407
|
-
|
|
576
|
+
|--|--|
|
|
577
|
+
| 🍏 Apple Silicon (M1+) | candle **Metal**, BF16, fused SDPA attention |
|
|
578
|
+
| 🍏 Apple Silicon (M3+) | … plus a **CoreML Neural Engine cascade** — ~18% faster full index (measured, M3 Max) |
|
|
579
|
+
| 🟩 NVIDIA GPU (SM 7.0+) | candle **CUDA**; **flash-attention** on Ampere+ |
|
|
580
|
+
| 💻 No accelerator | **ONNX Runtime INT8** — tuned CPU path, 132 MB model, **zero GPU weights downloaded** |
|
|
581
|
+
|
|
582
|
+
### 🧩 Chunking — every chunk is whole code, never a fixed window
|
|
583
|
+
- **[cAST](https://arxiv.org/abs/2506.15655)** structure-aware chunking over real **tree-sitter** ASTs: a recursive *split-then-merge* greedily packs sibling AST nodes up to the size cap and recurses *into* nodes too big to fit. So a chunk is always a **function, a class, or a contiguous run of declarations** — never a body cut in half, never a string split mid-literal.
|
|
584
|
+
- **14 languages** get true AST grammars — `JS · TS · TSX · Python · Go · Rust · Java · C · C++ · Ruby · PHP · Kotlin · Swift · C#` — and a **39-config regex registry** carries structure-aware chunking to **70+ more extensions**.
|
|
585
|
+
|
|
586
|
+
### 🏷️ Metadata — context the encoder can actually see
|
|
587
|
+
- Every chunk ships its **symbol name · entity type · signature · line span** — the metadata that powers the code graph, `ss-read` annotations, and the self-contained answers everywhere else.
|
|
588
|
+
- **Contextual enrichment:** before embedding, each chunk is prefixed with a structured preamble assembled from the AST + code graph — *file path · enclosing-scope breadcrumb · name & type · merged siblings · the imports it actually uses*. **Both** encoders see it, so a bare `getId()` still retrieves on the class and module around it.
|
|
589
|
+
- Our nod to **[Anthropic's Contextual Retrieval](https://www.anthropic.com/news/contextual-retrieval)** — except they prepend an *LLM-generated* summary (one model call per chunk); we derive the context **deterministically from structure**: no LLM, no per-chunk inference, regenerated for free on every reindex. **Tuned per language** from GenCodeSearchNet ablations — Python stays minimal, the Java family keeps a slug-stripped path, JS/Ruby/Go/C/C++/Rust get the full preamble where closures and imports earn their keep.
|
|
590
|
+
|
|
591
|
+
### 🧠 Cache-aware batching — we read your CPU before we batch it
|
|
592
|
+
- We **detect your last-level cache at runtime** — `hw.perflevel0.l2cachesize` (the 16 MB P-cluster on Apple Silicon, *not* the smaller E-cluster), Intel L3, or `/sys/.../cache` on Linux — then size every embedding batch so **one transformer layer's weights *plus* the batch's activations stay resident in cache**. No spilling to main memory mid-layer; on a long-sequence tail that's the difference between B=1 and a measured **2.1× per-chunk slowdown**.
|
|
593
|
+
- **Uses every core the hardware really has** — full count on ARM/Apple Silicon; x86 SMT siblings discounted because they don't scale inference linearly.
|
|
594
|
+
- **ORT drives the CPU path** (ONNX Runtime); GPU hosts swap in fused kernels (below). Either way inference runs off the event loop as a napi `AsyncTask`, so tokenization and SQLite writes overlap compute instead of stalling behind it.
|
|
595
|
+
|
|
596
|
+
### 🗜️ Two quantizations — one buys speed, one buys size
|
|
597
|
+
| | **Model weights** · INT8 ORT | **Index vectors** · INT4 binary |
|
|
598
|
+
|:--|:--|:--|
|
|
599
|
+
| **Job** | build the index faster on CPU | keep the on-disk index tiny |
|
|
600
|
+
| **Win** | **~2× faster** indexing · 4× smaller model (**132 MB**) | LI index **1.34 GiB → ~396 MiB** · INT4 nibble-packing halves it again |
|
|
601
|
+
| **Fidelity** | **≥ 0.96 cosine** vs FP32 | **no measurable retrieval loss** (A/B-tested vs INT8) |
|
|
602
|
+
|
|
603
|
+
### 🤖 Two models — both open, both local, both code-specialized
|
|
604
|
+
- **[CodeRankEmbed](https://huggingface.co/nomic-ai/CodeRankEmbed)** — 768-d dense bi-encoder (137M, Apache-2.0) for first-stage recall.
|
|
605
|
+
- **[LateOn-Code](https://huggingface.co/lightonai/LateOn-Code)** — ModernBERT per-token **late interaction** (149M) for the rerank.
|
|
606
|
+
- **Edge fallback for leaner machines:** a **17M `edge` LateOn-Code** (~9× smaller FP32 backbone) auto-selects on low-RAM hosts, and the whole CPU path runs INT8 with **no GPU weights ever downloaded** — full local search on a laptop with no accelerator.
|
|
408
607
|
|
|
409
608
|
<details>
|
|
410
|
-
<summary><b>What's actually custom here</b></summary>
|
|
609
|
+
<summary><b>What's actually custom here — the kernels we hand-wrote</b></summary>
|
|
411
610
|
|
|
412
|
-
- **Surgical attention swap:** we vendor the upstream model implementations (NomicBERT for embeddings, ModernBERT for late interaction) and replace only the attention forward pass — an MLX-ported fused SDPA kernel on Metal, `candle-flash-attn` with varlen packing on CUDA Ampere+, and byte-for-byte upstream math on CPU so the fallback is provably identical.
|
|
611
|
+
- **Surgical attention swap:** we vendor the upstream model implementations (NomicBERT for embeddings, ModernBERT for late interaction) and replace **only the attention forward pass** — an MLX-ported fused SDPA kernel on Metal, `candle-flash-attn` with varlen packing on CUDA Ampere+, and byte-for-byte upstream math on CPU so the fallback is provably identical.
|
|
413
612
|
- **A silent-NaN bug, found and fixed:** Apple's Metal SDPA kernel downcasts attention masks to F16, which saturates the standard `f32::MIN` mask to `-Inf` and quietly produces NaN on padded rows — collapsing retrieval quality. We clamp the mask and serialize Metal command-buffer submissions (concurrent submission corrupts outputs on shared queues). Details in [`crates/sweet-search-native/src/inference/`](crates/sweet-search-native/src/inference/).
|
|
414
613
|
- **CoreML cascade:** 18 pre-traced `.mlpackage` variants (bucketed by sequence length) dispatched to the Apple Neural Engine through an Objective-C shim; oversized batches fall through to Metal. Gated to M3+ because on M1/M2 the ANE doesn't beat its own compile overhead — we measured, so it's off there.
|
|
415
|
-
- **
|
|
416
|
-
- **Pipelined indexing:** while batch *N+1* embeds, batch *N*'s vectors stream into SQLite through zero-copy buffer views; full rebuilds write to a temp file and atomically swap, so a crash never leaves you serving half an index.
|
|
417
|
-
- **Models:** CodeRankEmbed (768-d, code-specialized) for embeddings; LateOn-Code (ModernBERT) for per-token late interaction, in a full-fidelity `standard` and a compact `edge` variant (~9× smaller FP32 backbone; ~2× smaller on the INT8 CPU path).
|
|
614
|
+
- **Structure-routed enrichment:** the preamble (path · scope chain · symbol · siblings · imports) is assembled at index time from a code-graph line-range overlap query — never an LLM call — then routed per language family (full enriched text for JS/Ruby/Go/C-family/Rust, a slimmer path policy for Python and the Java family), every decision settled by per-language ablation rather than a global default.
|
|
615
|
+
- **Pipelined, crash-safe indexing:** while batch *N+1* embeds, batch *N*'s vectors stream into SQLite through zero-copy buffer views; full rebuilds write to a temp file and atomically swap, so a crash never leaves you serving half an index.
|
|
418
616
|
|
|
419
617
|
</details>
|
|
420
618
|
|
|
@@ -464,16 +662,16 @@ Four Rust crates do the heavy lifting, each with a graceful fallback so the engi
|
|
|
464
662
|
|
|
465
663
|
</details>
|
|
466
664
|
|
|
467
|
-
### 🗜️
|
|
665
|
+
### 🗜️ INT4 binary segments: the on-disk format behind the RAM-sized index
|
|
468
666
|
|
|
469
|
-
|
|
470
|
-
|
|
471
|
-
|
|
667
|
+
The quantization headline lives [up in indexing](#-gpu-accelerated-indexing-fully-local) — `1.34 GiB → ~396 MiB`,
|
|
668
|
+
INT4-halved again. Here's the **SSLX** segment format that delivers it: crash-safe by construction, and
|
|
669
|
+
the three-stage retrieval it feeds at query time.
|
|
472
670
|
|
|
473
671
|
<details>
|
|
474
672
|
<summary><b>Deep dive</b></summary>
|
|
475
673
|
|
|
476
|
-
- **INT4 by default:** per-token min/scale quantization with nibble packing (two values per byte), A/B-tested against the INT8 baseline with no meaningful retrieval regression before becoming the default.
|
|
674
|
+
- **INT4 by default:** per-token min/scale quantization with nibble packing (two values per byte), A/B-tested against the INT8 baseline with no meaningful retrieval regression before becoming the default. We borrowed the *rotation insight* from Google's [TurboQuant](docs/LI_QUANTIZATION_STRATEGY.md), but ship plain INT4 — the full TurboQuant algorithm (WHT + PolarQuant + QJL) is researched and deferred, not in the product path.
|
|
477
675
|
- **SSLX binary segments:** the index persists as ~10k-document binary segment files with structured headers and CRC32 footers — a crash costs you at most one segment, not the index.
|
|
478
676
|
- **Three-stage retrieval:** a binary HNSW (Hamming distance over 64-byte binarized vectors, ~32× smaller than float HNSW) produces candidates in ~100 µs, INT8 rescoring narrows them, and a float32 sidecar rescores the final pool — speed without giving up top-result quality.
|
|
479
677
|
- **Memory-mapped HNSW:** the float graph index loads via `mmap` (USearch `view()`), contributing **0 MB** to the V8 heap at search time; the OS reclaims pages under pressure.
|
|
@@ -482,30 +680,6 @@ default packs token vectors at half a byte each on top of that. Laptop-sized, fu
|
|
|
482
680
|
|
|
483
681
|
</details>
|
|
484
682
|
|
|
485
|
-
## 🎯 The Ranking Stack
|
|
486
|
-
|
|
487
|
-
Retrieval quality comes from *layers*, each one cheap, each one earning its place:
|
|
488
|
-
|
|
489
|
-
1. **Route** — CatBoost classifies the query (lexical / semantic / hybrid) and sets fusion weights; real file paths short-circuit straight to lexical
|
|
490
|
-
2. **Retrieve** — BM25F field-weighted lexical (a match on a function's *name* outranks one buried in a body) in parallel with the three-stage vector pipeline
|
|
491
|
-
3. **Fuse** — convex combination with per-route weights and quantile normalization, falling back to Reciprocal Rank Fusion on degenerate score distributions
|
|
492
|
-
4. **Anchor** — name a real symbol in your query and identifier-anchored retrieval injects the exact-name entity, even when the encoder ranked something tangential higher
|
|
493
|
-
5. **Rerank** — ColBERT-style MaxSim late interaction over the quantized token index
|
|
494
|
-
6. **Expand** — typed-edge graph walks (1–2 hops, intent-adaptive, PathRAG-style flow pruning) pull in the related code a single chunk can't show
|
|
495
|
-
7. **Polish** — intent-aware demotion of docs/tests/config when you want implementation, call-site reference boosts, MMR diversity, near-duplicate sibling re-ranking
|
|
496
|
-
|
|
497
|
-
<details>
|
|
498
|
-
<summary><b>Deep dive & design honesty</b></summary>
|
|
499
|
-
|
|
500
|
-
- **Intent awareness:** a lightweight classifier distinguishes "fix this crash" from "how do I use this API" and tunes graph-edge selection, result limits, and chunk-type preferences per intent.
|
|
501
|
-
- **Quality priors:** each chunk carries a 0–1 prior from test proximity, git recency, symbol centrality (PageRank), comment density, and complexity — production code surfaces, stale fixtures sink.
|
|
502
|
-
- **Community structure:** a canonical Leiden algorithm detects code communities on the entity graph at index time, feeding vocabulary prewarming and structural signals — the engine understands your modules, not just your directories.
|
|
503
|
-
- **Multilingual:** 14 languages get full tree-sitter AST treatment; a 39-config registry covers 70+ extensions beyond that; router features handle camelCase/snake_case decomposition, CJK density, and German compounds.
|
|
504
|
-
- **Long-query rescue:** wordy natural-language queries that FTS5 would tokenize into an unsatisfiable AND get a multi-query BM25F + RRF fallback — one query per content keyword, fused.
|
|
505
|
-
- **A negative result we ship anyway:** we built a full cross-encoder rerank cascade behind an adaptive confidence gate, measured it on our evaluation sets — and it didn't beat MaxSim at 3× the latency. So it ships **disabled** (`SWEET_SEARCH_CASCADE_ENABLED=true` if you want to try). We'd rather ship the faster path than a fancier diagram.
|
|
506
|
-
|
|
507
|
-
</details>
|
|
508
|
-
|
|
509
683
|
## 🔌 Works With Your Agent
|
|
510
684
|
|
|
511
685
|
sweet-search meets your agent wherever it is — shell tools, MCP, or injected instructions:
|
|
@@ -565,6 +739,7 @@ sweet-search stands on a lot of shoulders, and we'd rather name them than preten
|
|
|
565
739
|
- **[CatBoost](https://catboost.ai/)** — the query router model; **Traag et al.** for the [Leiden algorithm](https://arxiv.org/abs/1810.08473); **Cormack et al.** for RRF; **[PathRAG](https://arxiv.org/abs/2502.14902)** for flow-pruned graph expansion; **[cAST](https://arxiv.org/abs/2506.15655)** for structure-aware chunking
|
|
566
740
|
- **[GEPA](https://arxiv.org/abs/2507.19457)** — the reflective evolutionary prompt-optimization paradigm behind our agent prompt
|
|
567
741
|
- **[nomic-ai](https://huggingface.co/nomic-ai)** — the CodeRankEmbed embedding model
|
|
742
|
+
- **[Anthropic](https://www.anthropic.com/news/contextual-retrieval)** — the Contextual Retrieval idea behind our chunk enrichment, here derived from code structure instead of an LLM summary
|
|
568
743
|
|
|
569
744
|
## 📄 License
|
|
570
745
|
|
|
Binary file
|
|
@@ -0,0 +1,209 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Animated terminal banner — universal renderer.
|
|
3
|
+
*
|
|
4
|
+
* Decodes the pre-baked lossless WebP sprite-sheet (assets/banner/) once with sharp,
|
|
5
|
+
* then animates it via the best path the terminal supports:
|
|
6
|
+
*
|
|
7
|
+
* Kitty graphics (Ghostty, kitty, WezTerm, Konsole) — pixel-perfect
|
|
8
|
+
* iTerm2 inline (iTerm.app) — pixel-perfect
|
|
9
|
+
* Sixel (Windows Terminal, Konsole, xterm, foot…) — crisp raster
|
|
10
|
+
* half-blocks (everything else: Apple Terminal, VS Code, SSH, CI-tty) — truecolor ▀
|
|
11
|
+
*
|
|
12
|
+
* Zero prerequisites for users: the only dependency is sharp (already a package dep,
|
|
13
|
+
* auto-installed per-platform by npm). Chrome is used ONLY at bake time (scripts/bake-banner.mjs).
|
|
14
|
+
*
|
|
15
|
+
* Safe by construction: renders only to an interactive TTY, never throws, honours
|
|
16
|
+
* NO_BANNER / SWEET_SEARCH_NO_BANNER / CI, and leaves the final frame in place when done.
|
|
17
|
+
*/
|
|
18
|
+
import { createRequire } from 'node:module';
|
|
19
|
+
import { fileURLToPath } from 'node:url';
|
|
20
|
+
import { readFileSync } from 'node:fs';
|
|
21
|
+
import { dirname, join } from 'node:path';
|
|
22
|
+
import { encodeSixelFrame, buildPalette, makeMapper, sampleColors } from './sixel.js';
|
|
23
|
+
|
|
24
|
+
const require = createRequire(import.meta.url);
|
|
25
|
+
|
|
26
|
+
const ASSET_DIR = join(dirname(fileURLToPath(import.meta.url)), '..', '..', 'assets', 'banner');
|
|
27
|
+
const ESC = '\x1b', BEL = '\x07', ST = ESC + '\\';
|
|
28
|
+
const SYNC_ON = `${ESC}[?2026h`, SYNC_OFF = `${ESC}[?2026l`;
|
|
29
|
+
const sleep = (ms) => new Promise(r => setTimeout(r, ms));
|
|
30
|
+
|
|
31
|
+
// ---------------- gating ----------------
|
|
32
|
+
function shouldRender(stream, env) {
|
|
33
|
+
if (env.SWEET_SEARCH_NO_BANNER || env.NO_BANNER) return false;
|
|
34
|
+
if (env.CI) return false;
|
|
35
|
+
if (!stream || !stream.isTTY) return false;
|
|
36
|
+
if ((env.TERM || '') === 'dumb') return false;
|
|
37
|
+
return true;
|
|
38
|
+
}
|
|
39
|
+
|
|
40
|
+
// ---------------- terminal capability query (DA1 + text-area size) ----------------
|
|
41
|
+
function queryTerminal(stream, env, timeout = 140) {
|
|
42
|
+
return new Promise((resolve) => {
|
|
43
|
+
const stdin = process.stdin;
|
|
44
|
+
if (env.SS_NO_QUERY || !stdin || !stdin.isTTY) { resolve({}); return; }
|
|
45
|
+
let buf = '', done = false, t;
|
|
46
|
+
const finish = () => {
|
|
47
|
+
if (done) return; done = true;
|
|
48
|
+
try { stdin.removeListener('data', onData); stdin.setRawMode(false); stdin.pause(); } catch { /* noop */ }
|
|
49
|
+
clearTimeout(t);
|
|
50
|
+
const da1 = (buf.match(/\x1b\[\?([0-9;]+)c/) || [])[1] || '';
|
|
51
|
+
const size = buf.match(/\x1b\[4;(\d+);(\d+)t/);
|
|
52
|
+
resolve({ sixel: da1.split(';').includes('4'), areaH: size ? +size[1] : 0, areaW: size ? +size[2] : 0 });
|
|
53
|
+
};
|
|
54
|
+
const onData = (d) => { buf += d.toString('latin1'); if (/\x1b\[\?[0-9;]+c/.test(buf) && (/\x1b\[4;\d+;\d+t/.test(buf) || buf.length > 64)) finish(); };
|
|
55
|
+
try { stdin.setRawMode(true); stdin.resume(); stdin.on('data', onData); } catch { resolve({}); return; }
|
|
56
|
+
stream.write(`${ESC}[14t${ESC}[c`); // text-area pixel size, then primary device attributes
|
|
57
|
+
t = setTimeout(finish, timeout);
|
|
58
|
+
});
|
|
59
|
+
}
|
|
60
|
+
|
|
61
|
+
function detectProto(env, caps) {
|
|
62
|
+
if (env.SS_PROTO) return env.SS_PROTO;
|
|
63
|
+
const tp = env.TERM_PROGRAM || '';
|
|
64
|
+
if (env.KITTY_WINDOW_ID || env.GHOSTTY_RESOURCES_DIR || tp === 'ghostty' || tp === 'WezTerm' || /kitty/i.test(env.TERM || '')) return 'kitty';
|
|
65
|
+
if (tp === 'iTerm.app') return 'iterm';
|
|
66
|
+
if (caps.sixel || env.WT_SESSION || /foot|contour|mlterm/i.test(env.TERM || '')) return 'sixel';
|
|
67
|
+
return 'blocks';
|
|
68
|
+
}
|
|
69
|
+
|
|
70
|
+
// ---------------- color (truecolor default; 256 fallback) ----------------
|
|
71
|
+
function rgb256(r, g, b) {
|
|
72
|
+
if (Math.abs(r - g) < 10 && Math.abs(g - b) < 10) { if (r < 8) return 16; if (r > 248) return 231; return 232 + Math.round(((r - 8) / 247) * 24); }
|
|
73
|
+
return 16 + 36 * Math.round(r / 255 * 5) + 6 * Math.round(g / 255 * 5) + Math.round(b / 255 * 5);
|
|
74
|
+
}
|
|
75
|
+
const GHALF = [' ', '▀', '▄', '█'];
|
|
76
|
+
function sextantChar(v) {
|
|
77
|
+
if (v === 0) return ' '; if (v === 63) return '█'; if (v === 21) return '▌'; if (v === 42) return '▐';
|
|
78
|
+
return String.fromCodePoint(0x1FB00 + (v - 1 - (v > 21 ? 1 : 0) - (v > 42 ? 1 : 0)));
|
|
79
|
+
}
|
|
80
|
+
|
|
81
|
+
// ---------------- main ----------------
|
|
82
|
+
export async function showBanner(opts = {}) {
|
|
83
|
+
const env = opts.env || process.env;
|
|
84
|
+
const out = opts.stream || process.stdout;
|
|
85
|
+
let onSig = null;
|
|
86
|
+
try {
|
|
87
|
+
if (!shouldRender(out, env)) return false;
|
|
88
|
+
|
|
89
|
+
const sharp = require('sharp');
|
|
90
|
+
const man = JSON.parse(readFileSync(join(ASSET_DIR, 'banner-manifest.json'), 'utf8'));
|
|
91
|
+
const { gridCols: GC, cellW: CW, cellH: CH, count: N, frameMs } = man;
|
|
92
|
+
const maxMs = opts.maxMs ?? Number(env.SS_BANNER_MS || 2600);
|
|
93
|
+
const color = opts.color || env.SS_COLOR || 'truecolor';
|
|
94
|
+
const cellMode = opts.cells || env.SS_CELLS || 'half';
|
|
95
|
+
|
|
96
|
+
const caps = opts.query === false ? {} : await queryTerminal(out, env);
|
|
97
|
+
const proto = opts.proto || detectProto(env, caps);
|
|
98
|
+
const capCols = Number(opts.cols || env.SS_COLS || 120);
|
|
99
|
+
const cols = Math.max(20, Math.min((out.columns || 80) - 2, capCols));
|
|
100
|
+
|
|
101
|
+
// decode the sprite sheet once
|
|
102
|
+
const { data: big, info } = await sharp(join(ASSET_DIR, man.file)).raw().toBuffer({ resolveWithObject: true });
|
|
103
|
+
const BW = info.width, ch = info.channels;
|
|
104
|
+
const frameRaw = (i) => {
|
|
105
|
+
const gx = (i % GC) * CW, gy = Math.floor(i / GC) * CH, f = Buffer.allocUnsafe(CW * CH * 4);
|
|
106
|
+
for (let y = 0; y < CH; y++) { const so = ((gy + y) * BW + gx) * ch; big.copy(f, y * CW * 4, so, so + CW * 4); }
|
|
107
|
+
return f;
|
|
108
|
+
};
|
|
109
|
+
|
|
110
|
+
const fg = color === '256' ? (r, g, b) => `${ESC}[38;5;${rgb256(r, g, b)}m` : (r, g, b) => `${ESC}[38;2;${r};${g};${b}m`;
|
|
111
|
+
const bg = color === '256' ? (r, g, b) => `${ESC}[48;5;${rgb256(r, g, b)}m` : (r, g, b) => `${ESC}[48;2;${r};${g};${b}m`;
|
|
112
|
+
const RESET = `${ESC}[0m`;
|
|
113
|
+
|
|
114
|
+
// ---------------- per-protocol lazy frame builder (build only frames we show) ----------------
|
|
115
|
+
let renderRows = Math.max(4, Math.round(cols / 6));
|
|
116
|
+
const cache = new Map();
|
|
117
|
+
let buildOne;
|
|
118
|
+
|
|
119
|
+
if (proto === 'kitty' || proto === 'iterm') {
|
|
120
|
+
buildOne = async (i) => (await sharp(frameRaw(i), { raw: { width: CW, height: CH, channels: 4 } }).png().toBuffer()).toString('base64');
|
|
121
|
+
} else if (proto === 'sixel') {
|
|
122
|
+
const cellWpx = caps.areaW && out.columns ? caps.areaW / out.columns : 8;
|
|
123
|
+
const Wp = Math.min(1000, Math.max(360, Math.round(cols * cellWpx)));
|
|
124
|
+
const Hp = Math.round(Wp / 3);
|
|
125
|
+
const cellHpx = caps.areaH && out.rows ? caps.areaH / out.rows : 16;
|
|
126
|
+
renderRows = Math.max(4, Math.round(Hp / cellHpx));
|
|
127
|
+
const sixelPx = async (i) => (await sharp(frameRaw(i), { raw: { width: CW, height: CH, channels: 4 } }).resize(Wp, Hp, { fit: 'fill' }).raw().toBuffer());
|
|
128
|
+
// global palette from 2 representative frames (colours are stable across the loop)
|
|
129
|
+
const samp = [...sampleColors(await sixelPx(0), 9), ...sampleColors(await sixelPx((N / 2) | 0), 9)];
|
|
130
|
+
const palette = buildPalette(samp, 255), mapper = makeMapper(palette);
|
|
131
|
+
buildOne = async (i) => encodeSixelFrame(await sixelPx(i), Wp, Hp, palette, mapper);
|
|
132
|
+
} else {
|
|
133
|
+
const [sc, sr] = cellMode === 'sextant' ? [2, 3] : [1, 2];
|
|
134
|
+
const R = Math.max(3, Math.round((cols * sc) * CH / CW / sr)); renderRows = R;
|
|
135
|
+
const W = cols * sc, H = R * sr;
|
|
136
|
+
const glyph = cellMode === 'sextant' ? sextantChar : (v) => GHALF[v];
|
|
137
|
+
buildOne = async (i) => {
|
|
138
|
+
const { data: px } = await sharp(frameRaw(i), { raw: { width: CW, height: CH, channels: 4 } }).resize(W, H, { fit: 'fill', kernel: 'lanczos3' }).sharpen({ sigma: 0.7 }).raw().toBuffer({ resolveWithObject: true });
|
|
139
|
+
let s = '';
|
|
140
|
+
for (let cy = 0; cy < R; cy++) {
|
|
141
|
+
for (let cx = 0; cx < cols; cx++) {
|
|
142
|
+
const sub = []; let anyT = false, anyO = false;
|
|
143
|
+
for (let y = 0; y < sr; y++) for (let x = 0; x < sc; x++) { const o = ((cy * sr + y) * W + (cx * sc + x)) * 4, a = px[o + 3]; sub.push({ r: px[o], g: px[o + 1], b: px[o + 2], a }); if (a < 128) anyT = true; else anyO = true; }
|
|
144
|
+
if (!anyO) { s += RESET + ' '; continue; }
|
|
145
|
+
const op = sub.filter(p => p.a >= 128), lum = p => 0.299 * p.r + 0.587 * p.g + 0.114 * p.b;
|
|
146
|
+
let lo = op[0], hi = op[0];
|
|
147
|
+
for (const p of op) { if (lum(p) < lum(lo)) lo = p; if (lum(p) > lum(hi)) hi = p; }
|
|
148
|
+
const dist = (p, q) => (p.r - q.r) ** 2 + (p.g - q.g) ** 2 + (p.b - q.b) ** 2;
|
|
149
|
+
let fr = 0, fG = 0, fb = 0, fn = 0, br = 0, bG = 0, bb = 0, bn = 0, bits = 0;
|
|
150
|
+
for (let k = 0; k < sub.length; k++) { const p = sub[k]; if (p.a < 128) continue; if (dist(p, hi) <= dist(p, lo)) { bits |= (1 << k); fr += p.r; fG += p.g; fb += p.b; fn++; } else { br += p.r; bG += p.g; bb += p.b; bn++; } }
|
|
151
|
+
const fc = fn ? [Math.round(fr / fn), Math.round(fG / fn), Math.round(fb / fn)] : null;
|
|
152
|
+
const bc = bn ? [Math.round(br / bn), Math.round(bG / bn), Math.round(bb / bn)] : null;
|
|
153
|
+
const full = (1 << sub.length) - 1;
|
|
154
|
+
if (anyT) s += bits === 0 ? RESET + ' ' : RESET + fg(...(fc || bc)) + glyph(bits);
|
|
155
|
+
else if (bits === 0) s += bg(...bc) + ' ';
|
|
156
|
+
else if (bits === full) s += fg(...fc) + glyph(bits);
|
|
157
|
+
else s += bg(...bc) + fg(...fc) + glyph(bits);
|
|
158
|
+
}
|
|
159
|
+
s += RESET; if (cy < R - 1) s += '\r\n';
|
|
160
|
+
}
|
|
161
|
+
return s;
|
|
162
|
+
};
|
|
163
|
+
}
|
|
164
|
+
const getFrame = async (i) => { let v = cache.get(i); if (v === undefined) { v = await buildOne(i); cache.set(i, v); } return v; };
|
|
165
|
+
|
|
166
|
+
// ---------------- emit / animate (bounded) ----------------
|
|
167
|
+
const CHUNK = 4096;
|
|
168
|
+
const kittyFrame = (b64, id) => {
|
|
169
|
+
if (b64.length <= CHUNK) { out.write(`${ESC}_Gf=100,a=T,q=2,c=${cols},C=1,i=${id};${b64}${ST}`); return; }
|
|
170
|
+
for (let i = 0; i < b64.length; i += CHUNK) { const piece = b64.slice(i, i + CHUNK), more = i + CHUNK < b64.length ? 1 : 0; out.write(i === 0 ? `${ESC}_Gf=100,a=T,q=2,c=${cols},C=1,i=${id},m=1;${piece}${ST}` : `${ESC}_Gm=${more};${piece}${ST}`); }
|
|
171
|
+
};
|
|
172
|
+
const kittyDel = (id) => out.write(`${ESC}_Ga=d,d=i,i=${id},q=2${ST}`);
|
|
173
|
+
|
|
174
|
+
await getFrame(0); // build first frame before clearing space (hide latency)
|
|
175
|
+
out.write(ESC + '[?25l');
|
|
176
|
+
// Restore the cursor if interrupted mid-animation (else Ctrl-C leaves it hidden).
|
|
177
|
+
onSig = () => { try { out.write(`${ESC}[0m${ESC}[?25h`); } catch { /* noop */ } process.exit(130); };
|
|
178
|
+
process.once('SIGINT', onSig);
|
|
179
|
+
if (proto === 'blocks' || proto === 'sixel') { for (let r = 0; r < renderRows; r++) out.write('\n'); out.write(`${ESC}[${renderRows}A`); }
|
|
180
|
+
else out.write('\n');
|
|
181
|
+
out.write(ESC + '7');
|
|
182
|
+
|
|
183
|
+
let prevId = 0, flip = 1;
|
|
184
|
+
const start = Date.now();
|
|
185
|
+
let i = 0;
|
|
186
|
+
while (Date.now() - start < maxMs) {
|
|
187
|
+
const frame = await getFrame(i);
|
|
188
|
+
out.write(SYNC_ON + ESC + '8');
|
|
189
|
+
if (proto === 'kitty') { const id = flip; kittyFrame(frame, id); if (prevId) kittyDel(prevId); prevId = id; flip = flip === 1 ? 2 : 1; }
|
|
190
|
+
else if (proto === 'iterm') out.write(`${ESC}]1337;File=inline=1;width=${cols};height=${renderRows};preserveAspectRatio=1:${frame}${BEL}`);
|
|
191
|
+
else out.write(frame);
|
|
192
|
+
out.write(SYNC_OFF);
|
|
193
|
+
await sleep(frameMs);
|
|
194
|
+
i = (i + 1) % N;
|
|
195
|
+
}
|
|
196
|
+
out.write(ESC + '8'); // settle: leave final frame, move below
|
|
197
|
+
out.write('\n'.repeat(renderRows + 1));
|
|
198
|
+
out.write(RESET + ESC + '[?25h');
|
|
199
|
+
if (onSig) process.removeListener('SIGINT', onSig); // hand Ctrl-C back to the caller
|
|
200
|
+
return true;
|
|
201
|
+
} catch (err) {
|
|
202
|
+
if (onSig) process.removeListener('SIGINT', onSig);
|
|
203
|
+
if (env.SS_BANNER_DEBUG) process.stderr.write(`[banner] ${err && err.stack || err}\n`);
|
|
204
|
+
try { out.write(`${ESC}[0m${ESC}[?25h`); } catch { /* noop */ }
|
|
205
|
+
return false;
|
|
206
|
+
}
|
|
207
|
+
}
|
|
208
|
+
|
|
209
|
+
export default showBanner;
|
|
@@ -0,0 +1,126 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Minimal, dependency-free Sixel encoder for the terminal banner.
|
|
3
|
+
*
|
|
4
|
+
* Sixel renders crisp raster images on Windows Terminal (>=1.22), Konsole, xterm,
|
|
5
|
+
* foot, Contour and much of Linux — terminals that lack the Kitty graphics protocol.
|
|
6
|
+
*
|
|
7
|
+
* We build ONE shared palette across all frames (pixel art reuses colours, so a global
|
|
8
|
+
* palette is both faster and lets frames share colour definitions) and emit each frame
|
|
9
|
+
* with run-length-encoded sixel bands. Fully-transparent pixels are left unset so the
|
|
10
|
+
* terminal background shows through (rounded banner corners).
|
|
11
|
+
*/
|
|
12
|
+
|
|
13
|
+
const TRANSPARENT = -1;
|
|
14
|
+
|
|
15
|
+
function boxStats(box) {
|
|
16
|
+
let rMin = 255, rMax = 0, gMin = 255, gMax = 0, bMin = 255, bMax = 0;
|
|
17
|
+
for (const [r, g, b] of box) {
|
|
18
|
+
if (r < rMin) rMin = r; if (r > rMax) rMax = r;
|
|
19
|
+
if (g < gMin) gMin = g; if (g > gMax) gMax = g;
|
|
20
|
+
if (b < bMin) bMin = b; if (b > bMax) bMax = b;
|
|
21
|
+
}
|
|
22
|
+
const dr = rMax - rMin, dg = gMax - gMin, db = bMax - bMin;
|
|
23
|
+
const range = Math.max(dr, dg, db);
|
|
24
|
+
const channel = range === dr ? 0 : range === dg ? 1 : 2;
|
|
25
|
+
return { range, channel };
|
|
26
|
+
}
|
|
27
|
+
function avgColor(box) {
|
|
28
|
+
let r = 0, g = 0, b = 0;
|
|
29
|
+
for (const c of box) { r += c[0]; g += c[1]; b += c[2]; }
|
|
30
|
+
const n = box.length || 1;
|
|
31
|
+
return [Math.round(r / n), Math.round(g / n), Math.round(b / n)];
|
|
32
|
+
}
|
|
33
|
+
|
|
34
|
+
/** Median-cut quantisation over a sample of opaque colours. Returns up to maxColors [r,g,b]. */
|
|
35
|
+
export function buildPalette(samples, maxColors = 255) {
|
|
36
|
+
if (samples.length === 0) return [[0, 0, 0]];
|
|
37
|
+
let boxes = [samples];
|
|
38
|
+
while (boxes.length < maxColors) {
|
|
39
|
+
let bi = -1, best = -1;
|
|
40
|
+
for (let i = 0; i < boxes.length; i++) {
|
|
41
|
+
if (boxes[i].length < 2) continue;
|
|
42
|
+
const { range } = boxStats(boxes[i]);
|
|
43
|
+
if (range > best) { best = range; bi = i; }
|
|
44
|
+
}
|
|
45
|
+
if (bi < 0) break;
|
|
46
|
+
const box = boxes[bi];
|
|
47
|
+
const { channel } = boxStats(box);
|
|
48
|
+
box.sort((a, b) => a[channel] - b[channel]);
|
|
49
|
+
const mid = box.length >> 1;
|
|
50
|
+
boxes.splice(bi, 1, box.slice(0, mid), box.slice(mid));
|
|
51
|
+
}
|
|
52
|
+
return boxes.map(avgColor);
|
|
53
|
+
}
|
|
54
|
+
|
|
55
|
+
/** Fast rgb->palette-index mapper with a per-colour cache (pixel art has few unique colours). */
|
|
56
|
+
export function makeMapper(palette) {
|
|
57
|
+
const cache = new Map();
|
|
58
|
+
return (r, g, b) => {
|
|
59
|
+
const key = (r << 16) | (g << 8) | b;
|
|
60
|
+
const hit = cache.get(key);
|
|
61
|
+
if (hit !== undefined) return hit;
|
|
62
|
+
let best = 0, bestD = Infinity;
|
|
63
|
+
for (let i = 0; i < palette.length; i++) {
|
|
64
|
+
const p = palette[i];
|
|
65
|
+
const d = (p[0] - r) ** 2 + (p[1] - g) ** 2 + (p[2] - b) ** 2;
|
|
66
|
+
if (d < bestD) { bestD = d; best = i; }
|
|
67
|
+
}
|
|
68
|
+
cache.set(key, best);
|
|
69
|
+
return best;
|
|
70
|
+
};
|
|
71
|
+
}
|
|
72
|
+
|
|
73
|
+
/** Collect a colour sample from a raw RGBA frame (every `step`-th opaque pixel). */
|
|
74
|
+
export function sampleColors(rgba, step = 7) {
|
|
75
|
+
const out = [];
|
|
76
|
+
for (let i = 0; i < rgba.length; i += 4 * step) {
|
|
77
|
+
if (rgba[i + 3] >= 128) out.push([rgba[i], rgba[i + 1], rgba[i + 2]]);
|
|
78
|
+
}
|
|
79
|
+
return out;
|
|
80
|
+
}
|
|
81
|
+
|
|
82
|
+
/** Encode one RGBA frame to a Sixel string using a shared palette + mapper. */
|
|
83
|
+
export function encodeSixelFrame(rgba, width, height, palette, mapper) {
|
|
84
|
+
// index buffer: palette index per pixel, or TRANSPARENT
|
|
85
|
+
const idx = new Int16Array(width * height);
|
|
86
|
+
for (let p = 0, o = 0; p < idx.length; p++, o += 4) {
|
|
87
|
+
idx[p] = rgba[o + 3] < 128 ? TRANSPARENT : mapper(rgba[o], rgba[o + 1], rgba[o + 2]);
|
|
88
|
+
}
|
|
89
|
+
|
|
90
|
+
let s = '\x1bP0;1;0q'; // DCS; P2=1 => 0-bit pixels stay transparent
|
|
91
|
+
s += `"1;1;${width};${height}`; // raster attributes (1:1 aspect)
|
|
92
|
+
for (let i = 0; i < palette.length; i++) {
|
|
93
|
+
const [r, g, b] = palette[i];
|
|
94
|
+
s += `#${i};2;${Math.round(r / 255 * 100)};${Math.round(g / 255 * 100)};${Math.round(b / 255 * 100)}`;
|
|
95
|
+
}
|
|
96
|
+
|
|
97
|
+
const used = [];
|
|
98
|
+
for (let by = 0; by < height; by += 6) {
|
|
99
|
+
const bandH = Math.min(6, height - by);
|
|
100
|
+
// which palette indices appear in this band
|
|
101
|
+
used.length = 0;
|
|
102
|
+
const seen = new Set();
|
|
103
|
+
for (let y = by; y < by + bandH; y++) {
|
|
104
|
+
const row = y * width;
|
|
105
|
+
for (let x = 0; x < width; x++) { const c = idx[row + x]; if (c !== TRANSPARENT && !seen.has(c)) { seen.add(c); used.push(c); } }
|
|
106
|
+
}
|
|
107
|
+
for (let u = 0; u < used.length; u++) {
|
|
108
|
+
const ci = used[u];
|
|
109
|
+
if (u > 0) s += '$'; // return to band start, overlay next colour
|
|
110
|
+
s += `#${ci}`;
|
|
111
|
+
let runChar = -1, runLen = 0, line = '';
|
|
112
|
+
for (let x = 0; x < width; x++) {
|
|
113
|
+
let bits = 0;
|
|
114
|
+
for (let k = 0; k < bandH; k++) if (idx[(by + k) * width + x] === ci) bits |= (1 << k);
|
|
115
|
+
const ch = 63 + bits;
|
|
116
|
+
if (ch === runChar) { runLen++; }
|
|
117
|
+
else { if (runLen) line += runLen > 3 ? `!${runLen}${String.fromCharCode(runChar)}` : String.fromCharCode(runChar).repeat(runLen); runChar = ch; runLen = 1; }
|
|
118
|
+
}
|
|
119
|
+
if (runLen) line += runLen > 3 ? `!${runLen}${String.fromCharCode(runChar)}` : String.fromCharCode(runChar).repeat(runLen);
|
|
120
|
+
s += line;
|
|
121
|
+
}
|
|
122
|
+
s += '-'; // graphics newline
|
|
123
|
+
}
|
|
124
|
+
s += '\x1b\\'; // ST
|
|
125
|
+
return s;
|
|
126
|
+
}
|
|
@@ -121,6 +121,11 @@ async function main() {
|
|
|
121
121
|
setVerboseMode(true);
|
|
122
122
|
}
|
|
123
123
|
|
|
124
|
+
// Animated banner (best-effort; interactive TTY only, never in CI / quiet / stdin-fed runs).
|
|
125
|
+
if (!quiet && !help && !filesFromStdin && process.stdout.isTTY && !process.env.CI && !process.env.NO_BANNER && !process.env.SWEET_SEARCH_NO_BANNER) {
|
|
126
|
+
try { const { showBanner } = await import('../banner/render-banner.js'); await showBanner(); } catch { /* non-fatal */ }
|
|
127
|
+
}
|
|
128
|
+
|
|
124
129
|
// Apply late interaction model overrides before any model code runs.
|
|
125
130
|
// Precedence: --no-late-interaction > --late-interaction-model=… > env
|
|
126
131
|
// var (already honoured by LATE_INTERACTION_CONFIG.model at module load) >
|
|
@@ -650,7 +650,7 @@ export async function buildHNSWIndex(dbPath, dryRun = false) {
|
|
|
650
650
|
});
|
|
651
651
|
fsyncFile(sidecarPath);
|
|
652
652
|
fsyncDirectory(path.dirname(checkpointPath));
|
|
653
|
-
log(` checkpoint: ${added}/${totalVectors} vectors`, 'dim');
|
|
653
|
+
if (process.env.DEBUG) log(` checkpoint: ${added}/${totalVectors} vectors`, 'dim');
|
|
654
654
|
}
|
|
655
655
|
lastCheckpointTime = Date.now();
|
|
656
656
|
vectorsSinceCheckpoint = 0;
|
|
@@ -109,32 +109,64 @@ export function isVerboseMode() {
|
|
|
109
109
|
return verboseMode;
|
|
110
110
|
}
|
|
111
111
|
|
|
112
|
+
// ---------------------------------------------------------------------------
|
|
113
|
+
// Progress rendering — an in-place "sticky" bar that animates as a phase runs.
|
|
114
|
+
//
|
|
115
|
+
// On a TTY (verbose or not) the bar redraws on a single line via carriage return
|
|
116
|
+
// + erase-to-EOL, with smooth 1/8-block fill. While a bar is active, log() pins it:
|
|
117
|
+
// it clears the bar, prints the log line above, then redraws the bar below — so
|
|
118
|
+
// interleaved diagnostics (e.g. the HNSW "checkpoint:" line) never split the bar.
|
|
119
|
+
// Non-TTY (pipes / CI) falls back to throttled newlines so nothing is swallowed.
|
|
120
|
+
// ---------------------------------------------------------------------------
|
|
121
|
+
const BAR_WIDTH = 30;
|
|
122
|
+
const LABEL_COL = 17; // pad "Label:" to this width so every bar's [ ] aligns
|
|
123
|
+
const SUB_BLOCKS = ['', '▏', '▎', '▍', '▌', '▋', '▊', '▉']; // eighth-block partial fills
|
|
124
|
+
const CLEAR_EOL = '\x1b[K';
|
|
125
|
+
let activeBar = null; // last-rendered bar string while a phase is in progress (TTY only)
|
|
126
|
+
let lastLoggedPercent = {};
|
|
127
|
+
|
|
128
|
+
function renderBar(current, total, label) {
|
|
129
|
+
const ratio = total > 0 ? Math.max(0, Math.min(1, current / total)) : 1;
|
|
130
|
+
const eighths = Math.round(ratio * BAR_WIDTH * 8);
|
|
131
|
+
const full = Math.floor(eighths / 8);
|
|
132
|
+
const partial = SUB_BLOCKS[eighths % 8];
|
|
133
|
+
const bar = '█'.repeat(full) + partial;
|
|
134
|
+
const empty = '░'.repeat(Math.max(0, BAR_WIDTH - full - (partial ? 1 : 0)));
|
|
135
|
+
const head = `${label}:`.padEnd(LABEL_COL); // right border aligns across phases
|
|
136
|
+
const pct = (ratio * 100).toFixed(1).padStart(5);
|
|
137
|
+
return `${colors.cyan}${head}[${bar}${empty}] ${pct}% (${current}/${total})${colors.reset}`;
|
|
138
|
+
}
|
|
139
|
+
|
|
112
140
|
export function log(message, color = 'reset') {
|
|
113
141
|
if (quietMode) return;
|
|
114
|
-
|
|
142
|
+
const line = `${colors[color]}${message}${colors.reset}`;
|
|
143
|
+
if (activeBar && process.stdout.isTTY) {
|
|
144
|
+
// Pin the bar: clear it, print the log line above, redraw the bar below.
|
|
145
|
+
process.stdout.write(`\r${CLEAR_EOL}${line}\n${activeBar}${CLEAR_EOL}`);
|
|
146
|
+
} else {
|
|
147
|
+
console.log(line);
|
|
148
|
+
}
|
|
115
149
|
}
|
|
116
150
|
|
|
117
|
-
let lastLoggedPercent = {};
|
|
118
|
-
|
|
119
151
|
export function logProgress(current, total, label) {
|
|
120
152
|
if (quietMode) return;
|
|
121
|
-
|
|
122
|
-
|
|
123
|
-
|
|
124
|
-
const empty = '░'.repeat(30 - bar.length);
|
|
125
|
-
// In verbose mode or non-TTY, use newlines so output isn't swallowed by pipes.
|
|
126
|
-
// Throttle to every ~2% to avoid flooding.
|
|
127
|
-
if (verboseMode || !process.stdout.isTTY) {
|
|
153
|
+
if (!process.stdout.isTTY) {
|
|
154
|
+
// Pipes / CI: throttle to ~2% and emit newlines so output isn't swallowed.
|
|
155
|
+
const percentNum = total > 0 ? (current / total) * 100 : 100;
|
|
128
156
|
const lastPct = lastLoggedPercent[label] || 0;
|
|
129
|
-
if (percentNum - lastPct >= 2 || current
|
|
157
|
+
if (percentNum - lastPct >= 2 || current >= total || current <= 1) {
|
|
130
158
|
lastLoggedPercent[label] = percentNum;
|
|
131
|
-
console.log(
|
|
132
|
-
}
|
|
133
|
-
} else {
|
|
134
|
-
process.stdout.write(`\r${colors.cyan}${label}: [${bar}${empty}] ${percent}% (${current}/${total})${colors.reset}`);
|
|
135
|
-
if (current === total) {
|
|
136
|
-
process.stdout.write('\n');
|
|
159
|
+
console.log(renderBar(current, total, label));
|
|
137
160
|
}
|
|
161
|
+
return;
|
|
162
|
+
}
|
|
163
|
+
// Interactive TTY: animate the bar in place.
|
|
164
|
+
activeBar = renderBar(current, total, label);
|
|
165
|
+
process.stdout.write(`\r${activeBar}${CLEAR_EOL}`);
|
|
166
|
+
if (current >= total) {
|
|
167
|
+
process.stdout.write('\n');
|
|
168
|
+
activeBar = null;
|
|
169
|
+
lastLoggedPercent[label] = 0;
|
|
138
170
|
}
|
|
139
171
|
}
|
|
140
172
|
|
|
@@ -72,12 +72,17 @@ async function initWasm() {
|
|
|
72
72
|
|
|
73
73
|
initDone = true;
|
|
74
74
|
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
75
|
+
// Load-time tier diagnostic — gated behind DEBUG so it doesn't precede the
|
|
76
|
+
// banner / clutter normal output (the active tier is also shown in `init`'s
|
|
77
|
+
// summary as "MaxSim: …"). Set DEBUG=1 to surface it.
|
|
78
|
+
if (process.env.DEBUG) {
|
|
79
|
+
if (nativeMaxsim) {
|
|
80
|
+
console.error('[MaxSim] Tier 1: Native Rust + Rayon (parallel SIMD)');
|
|
81
|
+
} else if (maxsimExports || wasmExports?.maxsim_f32) {
|
|
82
|
+
console.error('[MaxSim] Tier 2: WASM SIMD f32x4');
|
|
83
|
+
} else {
|
|
84
|
+
console.error('[MaxSim] Tier 3: JS fallback');
|
|
85
|
+
}
|
|
81
86
|
}
|
|
82
87
|
|
|
83
88
|
return true;
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "sweet-search",
|
|
3
|
-
"version": "2.5.
|
|
3
|
+
"version": "2.5.6",
|
|
4
4
|
"description": "Sweet Search - SOTA Hybrid Code Search Engine with WASM CatBoost Query Router, Semantic/Lexical/Structural Search, and Multilingual Support",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"main": "core/search/sweet-search.js",
|
|
@@ -13,12 +13,12 @@
|
|
|
13
13
|
"author": "Marko Sladojevic <marko@panonit.com> (https://panonit.com)",
|
|
14
14
|
"repository": {
|
|
15
15
|
"type": "git",
|
|
16
|
-
"url": "git+https://github.com/
|
|
16
|
+
"url": "git+https://github.com/mrsladoje/sweet-search.git"
|
|
17
17
|
},
|
|
18
18
|
"bugs": {
|
|
19
|
-
"url": "https://github.com/
|
|
19
|
+
"url": "https://github.com/mrsladoje/sweet-search/issues"
|
|
20
20
|
},
|
|
21
|
-
"homepage": "https://github.com/
|
|
21
|
+
"homepage": "https://github.com/mrsladoje/sweet-search#readme",
|
|
22
22
|
"keywords": [
|
|
23
23
|
"sweet-search",
|
|
24
24
|
"code-search",
|
|
@@ -50,8 +50,11 @@
|
|
|
50
50
|
"core/vector-store/",
|
|
51
51
|
"core/query/",
|
|
52
52
|
"core/skills/",
|
|
53
|
+
"core/banner/",
|
|
54
|
+
"assets/banner/",
|
|
53
55
|
"mcp/",
|
|
54
56
|
"scripts/benchmark-harness.js",
|
|
57
|
+
"scripts/postinstall-banner.js",
|
|
55
58
|
"scripts/init.js",
|
|
56
59
|
"scripts/uninstall.js",
|
|
57
60
|
"scripts/verify-runtime.js",
|
|
@@ -78,6 +81,8 @@
|
|
|
78
81
|
],
|
|
79
82
|
"scripts": {
|
|
80
83
|
"init": "node scripts/init.js",
|
|
84
|
+
"postinstall": "node scripts/postinstall-banner.js",
|
|
85
|
+
"bake:banner": "node scripts/bake-banner.mjs",
|
|
81
86
|
"build:assets": "node scripts/generate-asset-manifest.js",
|
|
82
87
|
"lint": "eslint core/",
|
|
83
88
|
"build": "node -e \"import('./core/search/sweet-search.js')\" && echo 'Build OK'",
|
|
@@ -152,17 +157,18 @@
|
|
|
152
157
|
"eslint": "^9.39.4",
|
|
153
158
|
"fast-check": "^4.5.3",
|
|
154
159
|
"p-map": "^7.0.4",
|
|
160
|
+
"puppeteer-core": "^25.1.0",
|
|
155
161
|
"typescript": "^5.9.3",
|
|
156
162
|
"vitest": "^4.0.16"
|
|
157
163
|
},
|
|
158
164
|
"optionalDependencies": {
|
|
159
165
|
"usearch": "^2.21.4",
|
|
160
|
-
"@sweet-search/native-darwin-arm64": "2.5.
|
|
161
|
-
"@sweet-search/native-darwin-x64": "2.5.
|
|
162
|
-
"@sweet-search/native-linux-arm64-gnu": "2.5.
|
|
163
|
-
"@sweet-search/native-linux-arm64-gnu-cuda": "2.5.
|
|
164
|
-
"@sweet-search/native-linux-x64-gnu": "2.5.
|
|
165
|
-
"@sweet-search/native-linux-x64-gnu-cuda": "2.5.
|
|
166
|
+
"@sweet-search/native-darwin-arm64": "2.5.6",
|
|
167
|
+
"@sweet-search/native-darwin-x64": "2.5.6",
|
|
168
|
+
"@sweet-search/native-linux-arm64-gnu": "2.5.6",
|
|
169
|
+
"@sweet-search/native-linux-arm64-gnu-cuda": "2.5.6",
|
|
170
|
+
"@sweet-search/native-linux-x64-gnu": "2.5.6",
|
|
171
|
+
"@sweet-search/native-linux-x64-gnu-cuda": "2.5.6"
|
|
166
172
|
},
|
|
167
173
|
"engines": {
|
|
168
174
|
"node": ">=18.0.0"
|
package/scripts/init.js
CHANGED
|
@@ -1450,6 +1450,12 @@ export async function runInit(args) {
|
|
|
1450
1450
|
return;
|
|
1451
1451
|
}
|
|
1452
1452
|
|
|
1453
|
+
// 0. Animated banner (best-effort; only on an interactive TTY, never in CI/pipes).
|
|
1454
|
+
if (process.stdout.isTTY && !process.env.CI && !process.env.NO_BANNER && !process.env.SWEET_SEARCH_NO_BANNER) {
|
|
1455
|
+
// query:false — init is interactive (readline); avoid any stdin contention with the terminal capability probe.
|
|
1456
|
+
try { const { showBanner } = await import('../core/banner/render-banner.js'); await showBanner({ query: false }); } catch { /* non-fatal */ }
|
|
1457
|
+
}
|
|
1458
|
+
|
|
1453
1459
|
// 1. Node.js version check
|
|
1454
1460
|
checkNodeVersion();
|
|
1455
1461
|
|
|
@@ -2016,7 +2022,7 @@ function runCoremlCascadeBuild(options = {}) {
|
|
|
2016
2022
|
` The CoreML cascade build path currently requires a local clone\n` +
|
|
2017
2023
|
` of the sweet-search repository — it is not yet shipped via npm.\n` +
|
|
2018
2024
|
` To build the cascade:\n` +
|
|
2019
|
-
` git clone https://github.com/
|
|
2025
|
+
` git clone https://github.com/mrsladoje/sweet-search\n` +
|
|
2020
2026
|
` cd sweet-search\n` +
|
|
2021
2027
|
` node scripts/build-coreml-cascade.js\n` +
|
|
2022
2028
|
` Then point your install at the managed cache (init detects it).`,
|
|
@@ -0,0 +1,46 @@
|
|
|
1
|
+
#!/usr/bin/env node
|
|
2
|
+
/**
|
|
3
|
+
* postinstall — play the animated banner once after install.
|
|
4
|
+
*
|
|
5
|
+
* npm pipes lifecycle-script stdout (it's not a TTY), so we render to the
|
|
6
|
+
* controlling terminal directly via /dev/tty when possible. This is Unix-only;
|
|
7
|
+
* on Windows (no /dev/tty) or when there is no controlling terminal (CI, detached,
|
|
8
|
+
* sandboxed installs) we simply skip.
|
|
9
|
+
*
|
|
10
|
+
* Defensive by design: renders only to a real terminal, honours CI / NO_BANNER /
|
|
11
|
+
* SWEET_SEARCH_NO_BANNER, swallows every error, and always exits 0 so it can never
|
|
12
|
+
* fail `npm install`.
|
|
13
|
+
*/
|
|
14
|
+
import process from 'node:process';
|
|
15
|
+
import tty from 'node:tty';
|
|
16
|
+
import { openSync, closeSync } from 'node:fs';
|
|
17
|
+
import { dirname, join } from 'node:path';
|
|
18
|
+
import { fileURLToPath } from 'node:url';
|
|
19
|
+
|
|
20
|
+
async function run() {
|
|
21
|
+
const env = process.env;
|
|
22
|
+
if (env.CI || env.NO_BANNER || env.SWEET_SEARCH_NO_BANNER) return;
|
|
23
|
+
|
|
24
|
+
// Pick an output stream that is a real terminal.
|
|
25
|
+
let stream = process.stdout.isTTY ? process.stdout : null;
|
|
26
|
+
let ownedFd = -1;
|
|
27
|
+
if (!stream && process.platform !== 'win32') {
|
|
28
|
+
try {
|
|
29
|
+
ownedFd = openSync('/dev/tty', 'r+'); // throws if no controlling terminal
|
|
30
|
+
const s = new tty.WriteStream(ownedFd);
|
|
31
|
+
if (s.isTTY) stream = s;
|
|
32
|
+
} catch { /* no controlling terminal — skip */ }
|
|
33
|
+
}
|
|
34
|
+
if (!stream) return;
|
|
35
|
+
|
|
36
|
+
try {
|
|
37
|
+
const here = dirname(fileURLToPath(import.meta.url));
|
|
38
|
+
const { showBanner } = await import(join(here, '..', 'core', 'banner', 'render-banner.js'));
|
|
39
|
+
// query:false — we have no matching stdin for this tty stream; rely on env-based detection.
|
|
40
|
+
const shown = await showBanner({ stream, env, query: false, maxMs: 2200 });
|
|
41
|
+
if (shown) stream.write(' sweet-search installed — run `sweet-search init` to get started.\n');
|
|
42
|
+
} catch { /* never break an install */ }
|
|
43
|
+
finally { if (ownedFd >= 0) { try { closeSync(ownedFd); } catch { /* noop */ } } }
|
|
44
|
+
}
|
|
45
|
+
|
|
46
|
+
run().finally(() => process.exit(0));
|