sweet-search 2.5.8 β 2.5.10
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md
CHANGED
|
@@ -233,25 +233,24 @@ The win is **harness-adaptive**: where the native loop is disciplined (Claude Co
|
|
|
233
233
|
<a id="bench-paper-type"></a>
|
|
234
234
|
### π 3. Paper-type retrieval benchmarks β *academic NLβcode IR*
|
|
235
235
|
|
|
236
|
-
> [!
|
|
237
|
-
>
|
|
238
|
-
>
|
|
239
|
-
>
|
|
240
|
-
>
|
|
241
|
-
> replaced with fresh numbers. Until then, expect the real results to be **higher**.
|
|
236
|
+
> [!NOTE]
|
|
237
|
+
> π **Refreshed on the current engine (June 2026).** AdvTest, CoIR, CoSQA, and M2CRB were just
|
|
238
|
+
> re-run on the latest build β the one with the late-interaction correctness fixes, HNSW tuning,
|
|
239
|
+
> and the May 2026 ranking overhaul β and every one of them moved **up**. GCSN, CoSQA+, and CLARC
|
|
240
|
+
> were already current. Reproduction artifacts are in [`eval/results/`](eval/results/).
|
|
242
241
|
|
|
243
242
|
Every number below is the **`ss-search` pipeline end-to-end** β the same binary you install, querying
|
|
244
|
-
against the **full corpus** (no 99-distractor shortcuts),
|
|
243
|
+
against the **full corpus** (no 99-distractor shortcuts), on an M3 Max.
|
|
245
244
|
|
|
246
245
|
| π Benchmark | π What it tests | # Queries | π― MRR@10 |
|
|
247
246
|
|-----------|---------------|--------:|-------:|
|
|
248
247
|
| π **GenCodeSearchNet** | NLβcode, 6 languages | 6,000 | **86.6** |
|
|
249
|
-
| πΊοΈ **M2CRB** | multilingual NLβcode (ES/PT/DE/FR β Py/Java/JS) | 2,814 | **
|
|
250
|
-
| π CoSQA (test split) | web queries β Python | 500 |
|
|
248
|
+
| πΊοΈ **M2CRB** | multilingual NLβcode (ES/PT/DE/FR β Py/Java/JS) | 2,814 | **65.9** |
|
|
249
|
+
| π CoSQA (test split) | web queries β Python | 500 | 98.8 |
|
|
251
250
|
| π CoSQA+ | web queries β Python, multi-match | 20,604 | 72.1 |
|
|
252
251
|
| βοΈ CLARC | NLβC/C++ (systems code) | 1,245 | 67.4 |
|
|
253
|
-
| π‘οΈ AdvTest
|
|
254
|
-
| π CoIR
|
|
252
|
+
| π‘οΈ AdvTest | adversarially renamed Python | 1,000 | **99.1** |
|
|
253
|
+
| π CoIR | 10 datasets, 14 languages | 4,500 | **72.4** |
|
|
255
254
|
|
|
256
255
|
**GenCodeSearchNet: the strongest result published anywhere, as far as we can tell.** The benchmark's
|
|
257
256
|
own paper tops out at MRR β€ 0.42 for its fine-tuned baselines (and β€ 0.10 on the cross-lingual subsets),
|
|
@@ -260,15 +259,15 @@ query**. sweet-search scores **0.866**, retrieving from the entire 6,000-documen
|
|
|
260
259
|
|
|
261
260
|
**M2CRB: best published number, no fine-tuning.** The benchmark paper's best model β a CodeBERT
|
|
262
261
|
*fine-tuned on the task's training mix* β reaches 52.7 (auMRRc, a metric averaged over smaller retrieval
|
|
263
|
-
pools). sweet-search reaches **
|
|
262
|
+
pools). sweet-search reaches **65.9 full-corpus MRR@10 out of the box**, on Spanish, Portuguese, German,
|
|
264
263
|
and French queries.
|
|
265
264
|
|
|
266
265
|
<details>
|
|
267
|
-
<summary><b>Methodology &
|
|
266
|
+
<summary><b>Methodology & build dates</b></summary>
|
|
268
267
|
|
|
269
268
|
- **Reproduction:** result artifacts live in `eval/results/`; rerun via `eval/run_all.js`.
|
|
270
269
|
- **Protocol note:** published baselines for GCSN and CoSQA-style benchmarks typically rank the gold snippet against 99 sampled distractors. All sweet-search numbers rank against the full benchmark corpus β strictly harder.
|
|
271
|
-
-
|
|
270
|
+
- **Build dates:** AdvTest, CoIR, CoSQA, and M2CRB were re-run on the **June 2026** engine (0 errors on each); GCSN, CoSQA+, and CLARC are from the May 2026 build. All numbers reflect the current late-interaction pipeline β the correctness fixes, HNSW tuning, and May ranking overhaul. The June re-runs all improved over their earlier builds (AdvTest 91.5β99.1, CoIR 57.3β72.4, CoSQA 97.0β98.8, M2CRB 60.2β65.9).
|
|
272
271
|
- **Honesty corner:** CrossCodeEval β cross-file *completion context* retrieval, a different task than NL search β sits at 0.12. We don't optimize for it and report it anyway.
|
|
273
272
|
- Dates and per-language breakdowns: [`docs/BENCHMARKS_EXPLAINED.md`](docs/BENCHMARKS_EXPLAINED.md).
|
|
274
273
|
|
|
@@ -540,31 +539,43 @@ without another search.
|
|
|
540
539
|
|
|
541
540
|
## π§ An Agent Prompt That Was Evolved, Not Written
|
|
542
541
|
|
|
543
|
-
|
|
542
|
+
Shipping six tools is easy. Getting an agent to *stop grepping in circles* is the hard part.
|
|
544
543
|
|
|
545
|
-
`sweet-search init` installs a ~1k-token system prompt that
|
|
546
|
-
|
|
547
|
-
|
|
548
|
-
|
|
549
|
-
|
|
544
|
+
So `sweet-search init` installs a ~1k-token system prompt that we **didn't write** β we *grew* it.
|
|
545
|
+
A GEPA-style loop mutated candidate prompts, scored each on a dual Pareto front (**accuracy Γ cost**)
|
|
546
|
+
against **two different production agents at once** β Claude Code (Sonnet) and Codex (GPT-5.5) β kept the
|
|
547
|
+
survivors, and repeated. A final correctness pass hardened the winner. ~1k tokens, one job: teach the
|
|
548
|
+
agent to search *well*.
|
|
550
549
|
|
|
551
|
-
|
|
550
|
+
**π The five rules it encodes:**
|
|
552
551
|
|
|
553
|
-
|
|
554
|
-
|
|
555
|
-
|
|
556
|
-
|
|
557
|
-
|
|
552
|
+
| | Rule | What it kills |
|
|
553
|
+
|--|--|--|
|
|
554
|
+
| π₯ | **Cheapest tool first** | Got an exact symbol? One `ss-grep`, trust the top hit, stop β no semantic search "just to confirm." |
|
|
555
|
+
| π― | **Trust the ranking** | At most one narrow read to confirm; never re-run a hit that already matched. |
|
|
556
|
+
| π« | **Absence is an answer** | Two empty probes (one semantic, one lexical) settle a negative β no third synonym, no `find`/`ls` spiral. |
|
|
557
|
+
| β | **No raw-shell escape** | The #1 token-waster in our trace analysis: agents bailing to dozens of raw `grep`/`find` calls after one miss. Door closed. |
|
|
558
|
+
| π | **Think before you dig** | Before a third probe, the agent states what it knows and what its blind spot is. |
|
|
559
|
+
|
|
560
|
+
**π§Ύ The receipts** β *held-out discipline throughout: a dev set to iterate on, a held-out set touched only at milestones, a sealed vault opened exactly once.*
|
|
561
|
+
|
|
562
|
+
| Validation gate | Result |
|
|
563
|
+
|--|--|
|
|
564
|
+
| π― **Held-out** (30 probes Γ both agents) | joint score *(worst of the two)* **0.988** |
|
|
565
|
+
| π **Out-of-distribution** (8 languages never seen in the loop) | **0.952** β *every* language β₯ 0.79, zero weak spots |
|
|
566
|
+
| π‘οΈ **Adversarial counter-probes** | **1.00 / 1.00** |
|
|
567
|
+
| π **Held-out model families** (never optimized on) | MiMo **0.988** Β· Qwen **0.980** β it generalizes, it doesn't memorize |
|
|
568
|
+
| π§© **Paraphrase robustness** (reword the prompt, same behavior) | correctness-weighted **0.95 / 0.93** |
|
|
558
569
|
|
|
559
570
|
<details>
|
|
560
|
-
<summary><b
|
|
561
|
-
|
|
562
|
-
- **
|
|
563
|
-
- **
|
|
564
|
-
- **
|
|
565
|
-
- **
|
|
566
|
-
-
|
|
567
|
-
-
|
|
571
|
+
<summary><b>π¬ How it was actually built (the honest version)</b></summary>
|
|
572
|
+
|
|
573
|
+
- **Seeds β survivors:** 15 hand-authored seed prompts entered a reflective-evolution loop (an agent reads the *real* tool-call traces, proposes one targeted edit, we keep what helps). Operators included trajectory crossover, structural pivots, tool-name masking, and a pruner that fights prompt bloat.
|
|
574
|
+
- **Two targets, jointly:** every candidate was scored on **both** Claude Code/Sonnet **and** Codex/GPT-5.5 with Maximin discipline (a prompt is only as good as its *worse* target), so it can't overfit one model's quirks.
|
|
575
|
+
- **What actually won:** not clever phrasing β **terseness** (a shorter prompt re-sent every turn is cheaper), a **leaner tool mix** (grep/read over heavy semantic blocks that fatten the transcript), and **decisiveness on no-match** (stop spiraling). We report this plainly because it's what the traces showed.
|
|
576
|
+
- **The correctness pass:** the shipped prompt ("M++") is the cost-winner plus 7 edits that fix factual descriptions of the tools β routing byte-identical, accuracy held, cost unchanged. A lateral move that buys honesty.
|
|
577
|
+
- **Held-out everything:** dev to iterate, held-out checked only at milestones, a sealed vault opened once, plus held-out *model families* (MiMo, Qwen) and a reasoning-mode replay (MiniMax **0.963**) it never trained against. Figures: [`docs/PHASE7.md`](docs/PHASE7.md) (internal probe suites; an externally-reproducible suite is in progress).
|
|
578
|
+
- **Idempotent install:** `init` writes a marker-delimited block into `CLAUDE.md` / `AGENTS.md` / `GEMINI.md` / `.cursor/rules` β re-run it freely, it never touches anything else you wrote.
|
|
568
579
|
|
|
569
580
|
</details>
|
|
570
581
|
|
|
@@ -123,7 +123,6 @@ export function isVerboseMode() {
|
|
|
123
123
|
const BAR_WIDTH = 30;
|
|
124
124
|
const LABEL_COL = 17; // pad "Label:" to this width so every bar's [ ] aligns
|
|
125
125
|
const SUB_BLOCKS = ['', 'β', 'β', 'β', 'β', 'β', 'β', 'β']; // eighth-block partial fills
|
|
126
|
-
const CLEAR_EOL = '\x1b[K';
|
|
127
126
|
const liveBars = new Map(); // label -> { current, total }; insertion order = display order
|
|
128
127
|
let regionLines = 0; // bar lines currently pinned at the bottom (TTY)
|
|
129
128
|
let lastLoggedPercent = {};
|
|
@@ -131,21 +130,38 @@ let deferredLogs = []; // lines held back while parallel bars run (flus
|
|
|
131
130
|
|
|
132
131
|
function renderBar(current, total, label) {
|
|
133
132
|
const ratio = total > 0 ? Math.max(0, Math.min(1, current / total)) : 1;
|
|
134
|
-
const
|
|
133
|
+
const head = `${label}:`.padEnd(LABEL_COL); // right border aligns across phases
|
|
134
|
+
const pct = (ratio * 100).toFixed(1).padStart(5);
|
|
135
|
+
const prefix = `${head}[`;
|
|
136
|
+
// Size the bar so the whole line fits the terminal width β a wrapped line would
|
|
137
|
+
// span two physical rows and break the cursor-up redraw math (β duplicate bars).
|
|
138
|
+
// Drop the (current/total) counts first when the terminal is too cramped.
|
|
139
|
+
const cols = process.stdout.columns || 80;
|
|
140
|
+
let suffix = `] ${pct}% (${current}/${total})`;
|
|
141
|
+
if (cols - prefix.length - suffix.length - 1 < 6) suffix = `] ${pct}%`;
|
|
142
|
+
const width = Math.max(1, Math.min(BAR_WIDTH, cols - prefix.length - suffix.length - 1));
|
|
143
|
+
const eighths = Math.round(ratio * width * 8);
|
|
135
144
|
const full = Math.floor(eighths / 8);
|
|
136
145
|
const partial = SUB_BLOCKS[eighths % 8];
|
|
137
146
|
const bar = 'β'.repeat(full) + partial;
|
|
138
|
-
const empty = 'β'.repeat(Math.max(0,
|
|
139
|
-
|
|
140
|
-
const pct = (ratio * 100).toFixed(1).padStart(5);
|
|
141
|
-
return `${colors.cyan}${head}[${bar}${empty}] ${pct}% (${current}/${total})${colors.reset}`;
|
|
147
|
+
const empty = 'β'.repeat(Math.max(0, width - full - (partial ? 1 : 0)));
|
|
148
|
+
return `${colors.cyan}${prefix}${bar}${empty}${suffix}${colors.reset}`;
|
|
142
149
|
}
|
|
143
150
|
|
|
144
|
-
|
|
145
|
-
|
|
146
|
-
|
|
147
|
-
|
|
148
|
-
|
|
151
|
+
// (Re)draw the live region in place (the `log-update` pattern). Invariant: the
|
|
152
|
+
// cursor enters and leaves at the END of the last bar line β NO trailing newline
|
|
153
|
+
// β so a redraw never pushes a stale copy of a bar into scrollback. Each redraw
|
|
154
|
+
// moves up to the first region line and erases to end-of-screen (\x1b[J) before
|
|
155
|
+
// rewriting. `aboveLine`, if given, scrolls one permanent line above the bars.
|
|
156
|
+
function regionEscape(aboveLine) {
|
|
157
|
+
const bars = [...liveBars].map(([l, b]) => renderBar(b.current, b.total, l));
|
|
158
|
+
let out = '';
|
|
159
|
+
if (regionLines > 1) out += `\x1b[${regionLines - 1}A`; // up to the first region line
|
|
160
|
+
out += '\r\x1b[J'; // col 0, erase region + everything below
|
|
161
|
+
if (aboveLine != null) out += aboveLine + '\n'; // permanent line above the bars
|
|
162
|
+
out += bars.join('\n'); // bars β no trailing newline
|
|
163
|
+
regionLines = bars.length;
|
|
164
|
+
return out;
|
|
149
165
|
}
|
|
150
166
|
|
|
151
167
|
export function log(message, color = 'reset') {
|
|
@@ -153,17 +169,13 @@ export function log(message, color = 'reset') {
|
|
|
153
169
|
const line = `${colors[color]}${message}${colors.reset}`;
|
|
154
170
|
if (regionLines > 0 && process.stdout.isTTY) {
|
|
155
171
|
if (liveBars.size > 1) {
|
|
156
|
-
// Parallel bars
|
|
157
|
-
// region
|
|
158
|
-
//
|
|
159
|
-
// once every bar in the region completes.
|
|
172
|
+
// Parallel bars live: defer the line so it can't disturb the region. Any
|
|
173
|
+
// mid-region print scrolls a stale bar-pair into scrollback. Flushed once
|
|
174
|
+
// every bar in the region finishes.
|
|
160
175
|
deferredLogs.push(line);
|
|
161
176
|
return;
|
|
162
177
|
}
|
|
163
|
-
//
|
|
164
|
-
let out = `\x1b[${regionLines}A\r${line}${CLEAR_EOL}\n`;
|
|
165
|
-
for (const [label, b] of liveBars) out += renderBar(b.current, b.total, label) + CLEAR_EOL + '\n';
|
|
166
|
-
process.stdout.write(out);
|
|
178
|
+
process.stdout.write(regionEscape(line)); // single bar: line above, bar redrawn below
|
|
167
179
|
} else {
|
|
168
180
|
console.log(line);
|
|
169
181
|
}
|
|
@@ -181,18 +193,18 @@ export function logProgress(current, total, label) {
|
|
|
181
193
|
}
|
|
182
194
|
return;
|
|
183
195
|
}
|
|
184
|
-
// Interactive TTY: update this bar in the live region and redraw.
|
|
196
|
+
// Interactive TTY: update this bar in the live region and redraw in place.
|
|
185
197
|
liveBars.set(label, { current, total });
|
|
186
|
-
|
|
198
|
+
process.stdout.write(regionEscape());
|
|
187
199
|
// Once every live bar is complete, commit the region (leave it on screen).
|
|
188
200
|
let allDone = true;
|
|
189
201
|
for (const b of liveBars.values()) if (b.current < b.total) { allDone = false; break; }
|
|
190
202
|
if (allDone) {
|
|
203
|
+
process.stdout.write('\n'); // move below the finished bars (cursor was at their end)
|
|
191
204
|
for (const k of liveBars.keys()) lastLoggedPercent[k] = 0;
|
|
192
205
|
liveBars.clear();
|
|
193
206
|
regionLines = 0;
|
|
194
|
-
// Flush
|
|
195
|
-
// the finished bars, in arrival order.
|
|
207
|
+
// Flush lines deferred while parallel bars ran β now below the finished bars.
|
|
196
208
|
if (deferredLogs.length) {
|
|
197
209
|
for (const l of deferredLogs) console.log(l);
|
|
198
210
|
deferredLogs = [];
|
|
@@ -1654,7 +1654,7 @@ export class LateInteractionIndex {
|
|
|
1654
1654
|
this._segmentDir = segDir;
|
|
1655
1655
|
this._currentSegment = new Map();
|
|
1656
1656
|
await this._saveAliasSidecar();
|
|
1657
|
-
console.log(`LateInteraction: Saved ${this.documents.size} documents across ${this._segments.length} segments`);
|
|
1657
|
+
if (process.env.DEBUG) console.log(`LateInteraction: Saved ${this.documents.size} documents across ${this._segments.length} segments`);
|
|
1658
1658
|
return;
|
|
1659
1659
|
}
|
|
1660
1660
|
}
|
|
@@ -1743,7 +1743,7 @@ export class LateInteractionIndex {
|
|
|
1743
1743
|
this._currentSegment = new Map();
|
|
1744
1744
|
|
|
1745
1745
|
await this._saveAliasSidecar();
|
|
1746
|
-
console.log(`LateInteraction: Saved ${this.documents.size} documents across ${newSegments.length} segments`);
|
|
1746
|
+
if (process.env.DEBUG) console.log(`LateInteraction: Saved ${this.documents.size} documents across ${newSegments.length} segments`);
|
|
1747
1747
|
return;
|
|
1748
1748
|
}
|
|
1749
1749
|
|
|
@@ -1820,7 +1820,10 @@ export class LateInteractionIndex {
|
|
|
1820
1820
|
await this._saveAliasSidecar();
|
|
1821
1821
|
|
|
1822
1822
|
const sizeMB = (bytesWritten / 1024 / 1024).toFixed(2);
|
|
1823
|
-
|
|
1823
|
+
// DEBUG-only: this prints during the indexer's parallel embed+LI progress region;
|
|
1824
|
+
// a direct write here moves the cursor and duplicates a bar. The indexer's
|
|
1825
|
+
// "β Late interaction index built: N docs (X MB)" line already reports this.
|
|
1826
|
+
if (process.env.DEBUG) console.log(`LateInteraction: Saved ${this.documents.size} documents (${sizeMB} MB)`);
|
|
1824
1827
|
}
|
|
1825
1828
|
|
|
1826
1829
|
/**
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "sweet-search",
|
|
3
|
-
"version": "2.5.
|
|
3
|
+
"version": "2.5.10",
|
|
4
4
|
"description": "Sweet Search - SOTA Hybrid Code Search Engine with WASM CatBoost Query Router, Semantic/Lexical/Structural Search, and Multilingual Support",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"main": "core/search/sweet-search.js",
|
|
@@ -163,12 +163,12 @@
|
|
|
163
163
|
},
|
|
164
164
|
"optionalDependencies": {
|
|
165
165
|
"usearch": "^2.21.4",
|
|
166
|
-
"@sweet-search/native-darwin-arm64": "2.5.
|
|
167
|
-
"@sweet-search/native-darwin-x64": "2.5.
|
|
168
|
-
"@sweet-search/native-linux-arm64-gnu": "2.5.
|
|
169
|
-
"@sweet-search/native-linux-arm64-gnu-cuda": "2.5.
|
|
170
|
-
"@sweet-search/native-linux-x64-gnu": "2.5.
|
|
171
|
-
"@sweet-search/native-linux-x64-gnu-cuda": "2.5.
|
|
166
|
+
"@sweet-search/native-darwin-arm64": "2.5.10",
|
|
167
|
+
"@sweet-search/native-darwin-x64": "2.5.10",
|
|
168
|
+
"@sweet-search/native-linux-arm64-gnu": "2.5.10",
|
|
169
|
+
"@sweet-search/native-linux-arm64-gnu-cuda": "2.5.10",
|
|
170
|
+
"@sweet-search/native-linux-x64-gnu": "2.5.10",
|
|
171
|
+
"@sweet-search/native-linux-x64-gnu-cuda": "2.5.10"
|
|
172
172
|
},
|
|
173
173
|
"engines": {
|
|
174
174
|
"node": ">=18.0.0"
|
|
@@ -29,9 +29,13 @@ function run() {
|
|
|
29
29
|
}
|
|
30
30
|
|
|
31
31
|
const c = (n, s) => `\x1b[${n}m${s}\x1b[0m`;
|
|
32
|
+
// SWEET SEARCH half-block wordmark (kept in sync with core/search/cli-decoration.js).
|
|
33
|
+
const L1 = 'βββ β β β βββ βββ βββ βββ βββ βββ βββ βββ βββ';
|
|
34
|
+
const L2 = 'βββ βββββ βββ βββ β βββ βββ βββ βββ βββ βββ';
|
|
32
35
|
const msg = [
|
|
33
36
|
'',
|
|
34
|
-
` ${c('1;38;5;
|
|
37
|
+
` ${c('1;38;5;135', L1)}`,
|
|
38
|
+
` ${c('1;38;5;135', L2)}`,
|
|
35
39
|
'',
|
|
36
40
|
` ${c('1', 'Get started:')}`,
|
|
37
41
|
` ${c('36', 'sweet-search init')} set up the current project`,
|