sweet-search 2.5.6 → 2.5.7
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +48 -11
- package/core/embedding/embedding-local-model.js +1 -0
- package/core/graph/relationship-resolver.js +5 -1
- package/core/indexing/index-codebase-v21.js +2 -6
- package/core/indexing/indexer-ann.js +3 -3
- package/core/indexing/indexer-build.js +1 -1
- package/core/indexing/indexer-utils.js +32 -17
- package/core/infrastructure/onnx-session-utils.js +1 -0
- package/core/ranking/late-interaction-model.js +1 -0
- package/package.json +7 -7
- package/scripts/init.js +21 -5
- package/scripts/postinstall-banner.js +23 -35
package/README.md
CHANGED
|
@@ -144,27 +144,29 @@ We measure sweet-search four ways — from how much it helps a real agent down t
|
|
|
144
144
|
|
|
145
145
|
<table>
|
|
146
146
|
<tr>
|
|
147
|
-
<td width="
|
|
147
|
+
<td width="50%" valign="top">
|
|
148
148
|
|
|
149
|
-
|
|
149
|
+
🤖 **[① Code-retrieval](#bench-code-retrieval)** *(agent-in-the-loop)*<br>
|
|
150
150
|
<sub>Does it make a real coding agent **cheaper and more useful** when it searches your repo? Paired against each model's own grep-and-read loop.</sub>
|
|
151
151
|
|
|
152
152
|
</td>
|
|
153
|
-
<td width="
|
|
153
|
+
<td width="50%" valign="top">
|
|
154
154
|
|
|
155
|
-
|
|
155
|
+
🚧 **[② Task-completion](#bench-task-completion)** *(coming soon)*<br>
|
|
156
156
|
<sub>Does cheaper, denser context **compound** into a higher resolve-rate on multi-step engineering tasks? Harness in progress.</sub>
|
|
157
157
|
|
|
158
158
|
</td>
|
|
159
|
-
|
|
159
|
+
</tr>
|
|
160
|
+
<tr>
|
|
161
|
+
<td width="50%" valign="top">
|
|
160
162
|
|
|
161
|
-
|
|
163
|
+
📄 **[③ Paper-type IR](#bench-paper-type)** *(academic)*<br>
|
|
162
164
|
<sub>The standard NL→code retrieval suites (GCSN, M2CRB, CoSQA…), full-corpus MRR@10.</sub>
|
|
163
165
|
|
|
164
166
|
</td>
|
|
165
|
-
<td width="
|
|
167
|
+
<td width="50%" valign="top">
|
|
166
168
|
|
|
167
|
-
|
|
169
|
+
⚡ **[④ Engine speed](#bench-engine-speed)**<br>
|
|
168
170
|
<sub>Raw systems numbers — grep throughput, query latency, rerank kernels, HNSW.</sub>
|
|
169
171
|
|
|
170
172
|
</td>
|
|
@@ -173,6 +175,7 @@ We measure sweet-search four ways — from how much it helps a real agent down t
|
|
|
173
175
|
|
|
174
176
|
---
|
|
175
177
|
|
|
178
|
+
<a id="bench-code-retrieval"></a>
|
|
176
179
|
### 🤖 1. Code-retrieval benchmarks — *the agent-in-the-loop test*
|
|
177
180
|
|
|
178
181
|
We install the evolved agent prompt (the [GEPA-evolved search discipline](#-an-agent-prompt-that-was-evolved-not-written)), point a coding agent at a real repo, and pair it **probe-for-probe against the same model running its own native grep-and-read loop**. Same model, same tasks, same judge — the only difference is whether sweet-search is wired in.
|
|
@@ -220,12 +223,14 @@ The win is **harness-adaptive**: where the native loop is disciplined (Claude Co
|
|
|
220
223
|
|
|
221
224
|
---
|
|
222
225
|
|
|
226
|
+
<a id="bench-task-completion"></a>
|
|
223
227
|
### 🚧 2. Task-completion benchmarks — *coming soon*
|
|
224
228
|
|
|
225
229
|
> Retrieval quality is necessary but not sufficient. Cheaper, denser context only matters if it **compounds across a real, multi-step engineering task** — finding the code, understanding it, changing it, and not breaking anything. The next suite measures exactly that: **resolve-rate on SWE-bench-style multi-file tasks**, sweet-search-wired vs. native, on the same paired, multiplicity-controlled bar as above. Harness and pilot are in progress — numbers land here when they clear that bar, and not before.
|
|
226
230
|
|
|
227
231
|
---
|
|
228
232
|
|
|
233
|
+
<a id="bench-paper-type"></a>
|
|
229
234
|
### 📄 3. Paper-type retrieval benchmarks — *academic NL→code IR*
|
|
230
235
|
|
|
231
236
|
> [!WARNING]
|
|
@@ -271,6 +276,7 @@ and French queries.
|
|
|
271
276
|
|
|
272
277
|
---
|
|
273
278
|
|
|
279
|
+
<a id="bench-engine-speed"></a>
|
|
274
280
|
### ⚡ 4. Engine speed — *systems benchmarks, measured in-repo*
|
|
275
281
|
|
|
276
282
|
<div align="center">
|
|
@@ -566,9 +572,36 @@ What it teaches:
|
|
|
566
572
|
|
|
567
573
|
> **Chunk → enrich → embed → quantize** — every step on-device and in Rust. Batches are sized to *your CPU's actual cache*, two open code-models do the encoding, and two separate quantizations make the index both **faster to build** and **small enough to live in RAM**. Zero API keys; nothing ever leaves the machine.
|
|
568
574
|
|
|
569
|
-
|
|
570
|
-
|
|
571
|
-
|
|
575
|
+
<table>
|
|
576
|
+
<tr>
|
|
577
|
+
<td width="50%" valign="top">
|
|
578
|
+
|
|
579
|
+
① 🧩 **[Structure-aware chunk](#idx-chunk)**<br>
|
|
580
|
+
<sub>cAST over tree-sitter ASTs — whole functions, never sliced mid-body</sub>
|
|
581
|
+
|
|
582
|
+
</td>
|
|
583
|
+
<td width="50%" valign="top">
|
|
584
|
+
|
|
585
|
+
② 🏷️ **[Enrich from structure](#idx-enrich)**<br>
|
|
586
|
+
<sub>deterministic preamble from the code graph — **no LLM call**</sub>
|
|
587
|
+
|
|
588
|
+
</td>
|
|
589
|
+
</tr>
|
|
590
|
+
<tr>
|
|
591
|
+
<td width="50%" valign="top">
|
|
592
|
+
|
|
593
|
+
③ 🤖 **[Embed — two models](#idx-embed)**<br>
|
|
594
|
+
<sub>dense **CodeRankEmbed** + per-token **LateOn-Code**</sub>
|
|
595
|
+
|
|
596
|
+
</td>
|
|
597
|
+
<td width="50%" valign="top">
|
|
598
|
+
|
|
599
|
+
④ 🗜️ **[Quantize + persist](#idx-quantize)**<br>
|
|
600
|
+
<sub>INT8 weights → **2× faster build** · INT4 vectors → **fits in RAM**</sub>
|
|
601
|
+
|
|
602
|
+
</td>
|
|
603
|
+
</tr>
|
|
604
|
+
</table>
|
|
572
605
|
|
|
573
606
|
**The inference engine, picked for your silicon:**
|
|
574
607
|
|
|
@@ -579,10 +612,12 @@ What it teaches:
|
|
|
579
612
|
| 🟩 NVIDIA GPU (SM 7.0+) | candle **CUDA**; **flash-attention** on Ampere+ |
|
|
580
613
|
| 💻 No accelerator | **ONNX Runtime INT8** — tuned CPU path, 132 MB model, **zero GPU weights downloaded** |
|
|
581
614
|
|
|
615
|
+
<a id="idx-chunk"></a>
|
|
582
616
|
### 🧩 Chunking — every chunk is whole code, never a fixed window
|
|
583
617
|
- **[cAST](https://arxiv.org/abs/2506.15655)** structure-aware chunking over real **tree-sitter** ASTs: a recursive *split-then-merge* greedily packs sibling AST nodes up to the size cap and recurses *into* nodes too big to fit. So a chunk is always a **function, a class, or a contiguous run of declarations** — never a body cut in half, never a string split mid-literal.
|
|
584
618
|
- **14 languages** get true AST grammars — `JS · TS · TSX · Python · Go · Rust · Java · C · C++ · Ruby · PHP · Kotlin · Swift · C#` — and a **39-config regex registry** carries structure-aware chunking to **70+ more extensions**.
|
|
585
619
|
|
|
620
|
+
<a id="idx-enrich"></a>
|
|
586
621
|
### 🏷️ Metadata — context the encoder can actually see
|
|
587
622
|
- Every chunk ships its **symbol name · entity type · signature · line span** — the metadata that powers the code graph, `ss-read` annotations, and the self-contained answers everywhere else.
|
|
588
623
|
- **Contextual enrichment:** before embedding, each chunk is prefixed with a structured preamble assembled from the AST + code graph — *file path · enclosing-scope breadcrumb · name & type · merged siblings · the imports it actually uses*. **Both** encoders see it, so a bare `getId()` still retrieves on the class and module around it.
|
|
@@ -593,6 +628,7 @@ What it teaches:
|
|
|
593
628
|
- **Uses every core the hardware really has** — full count on ARM/Apple Silicon; x86 SMT siblings discounted because they don't scale inference linearly.
|
|
594
629
|
- **ORT drives the CPU path** (ONNX Runtime); GPU hosts swap in fused kernels (below). Either way inference runs off the event loop as a napi `AsyncTask`, so tokenization and SQLite writes overlap compute instead of stalling behind it.
|
|
595
630
|
|
|
631
|
+
<a id="idx-quantize"></a>
|
|
596
632
|
### 🗜️ Two quantizations — one buys speed, one buys size
|
|
597
633
|
| | **Model weights** · INT8 ORT | **Index vectors** · INT4 binary |
|
|
598
634
|
|:--|:--|:--|
|
|
@@ -600,6 +636,7 @@ What it teaches:
|
|
|
600
636
|
| **Win** | **~2× faster** indexing · 4× smaller model (**132 MB**) | LI index **1.34 GiB → ~396 MiB** · INT4 nibble-packing halves it again |
|
|
601
637
|
| **Fidelity** | **≥ 0.96 cosine** vs FP32 | **no measurable retrieval loss** (A/B-tested vs INT8) |
|
|
602
638
|
|
|
639
|
+
<a id="idx-embed"></a>
|
|
603
640
|
### 🤖 Two models — both open, both local, both code-specialized
|
|
604
641
|
- **[CodeRankEmbed](https://huggingface.co/nomic-ai/CodeRankEmbed)** — 768-d dense bi-encoder (137M, Apache-2.0) for first-stage recall.
|
|
605
642
|
- **[LateOn-Code](https://huggingface.co/lightonai/LateOn-Code)** — ModernBERT per-token **late interaction** (149M) for the rerank.
|
|
@@ -172,6 +172,7 @@ export function buildLocalSessionOptions(quantLabel = 'q8', coremlAvailable = fa
|
|
|
172
172
|
|
|
173
173
|
const sessionOptions = {
|
|
174
174
|
graphOptimizationLevel: 'all',
|
|
175
|
+
logSeverityLevel: 3, // ERROR — silence ORT's expected "optimized model is machine-specific" warning
|
|
175
176
|
intraOpNumThreads: intraOpThreads,
|
|
176
177
|
interOpNumThreads: interOpThreads,
|
|
177
178
|
executionMode,
|
|
@@ -160,7 +160,11 @@ export function resolveRelationshipTargets(db) {
|
|
|
160
160
|
|
|
161
161
|
resolveAll();
|
|
162
162
|
|
|
163
|
-
|
|
163
|
+
if (resolved > 0) {
|
|
164
|
+
console.log(` ✓ Linked ${resolved}/${unresolved.length} references to local definitions`);
|
|
165
|
+
} else {
|
|
166
|
+
console.log(` ${unresolved.length} references resolve to external/library symbols (no local definition to link)`);
|
|
167
|
+
}
|
|
164
168
|
if (ambiguous > 0) {
|
|
165
169
|
console.log(` ⚠ ${ambiguous} ambiguous targets (multiple matches)`);
|
|
166
170
|
}
|
|
@@ -140,10 +140,6 @@ async function main() {
|
|
|
140
140
|
applyPersistedLiModel(process.env.SWEET_SEARCH_PROJECT_ROOT || process.cwd());
|
|
141
141
|
}
|
|
142
142
|
|
|
143
|
-
log(`${colors.bright}╔═══════════════════════════════════════════════════╗${colors.reset}`, 'bright');
|
|
144
|
-
log(`${colors.bright}║ Sweet Search Codebase Indexer v2.3 (SOTA Dec'25) ║${colors.reset}`, 'bright');
|
|
145
|
-
log(`${colors.bright}╚═══════════════════════════════════════════════════╝${colors.reset}`, 'bright');
|
|
146
|
-
|
|
147
143
|
if (vectorsOnly) {
|
|
148
144
|
log('⚠ WARNING: --vectors-only skips code graph rebuild', 'yellow');
|
|
149
145
|
log(' GraphRAG structural queries will use stale data', 'yellow');
|
|
@@ -338,13 +334,13 @@ Output:
|
|
|
338
334
|
}
|
|
339
335
|
|
|
340
336
|
// =========================================================================
|
|
341
|
-
// PHASE
|
|
337
|
+
// PHASE 1: Code Graph (if not --vectors-only)
|
|
342
338
|
// =========================================================================
|
|
343
339
|
let graphStats = { entities: 0, relationships: 0 };
|
|
344
340
|
let hcgsPromise = null;
|
|
345
341
|
|
|
346
342
|
if (!vectorsOnly) {
|
|
347
|
-
const graphResult = await runPhase('Code Graph
|
|
343
|
+
const graphResult = await runPhase('Code Graph', buildCodeGraphWithHCGSPhase, {
|
|
348
344
|
allFiles,
|
|
349
345
|
filesToIndex,
|
|
350
346
|
dryRun,
|
|
@@ -396,7 +396,7 @@ function diversityFirstPermutationRowids(filePaths) {
|
|
|
396
396
|
// =============================================================================
|
|
397
397
|
|
|
398
398
|
export async function incrementalUpdateHNSW(dbPath, changedFiles, dryRun = false) {
|
|
399
|
-
log('\n━━━ Phase
|
|
399
|
+
log('\n━━━ Phase 4: HNSW Index (Incremental) ━━━', 'bright');
|
|
400
400
|
|
|
401
401
|
if (dryRun) {
|
|
402
402
|
log('DRY RUN: Skipping HNSW incremental update', 'magenta');
|
|
@@ -510,7 +510,7 @@ export async function incrementalUpdateHNSW(dbPath, changedFiles, dryRun = false
|
|
|
510
510
|
// =============================================================================
|
|
511
511
|
|
|
512
512
|
export async function buildHNSWIndex(dbPath, dryRun = false) {
|
|
513
|
-
log('\n━━━ Phase
|
|
513
|
+
log('\n━━━ Phase 4: HNSW Index ━━━', 'bright');
|
|
514
514
|
|
|
515
515
|
if (dryRun) {
|
|
516
516
|
log('DRY RUN: Skipping HNSW index', 'magenta');
|
|
@@ -705,7 +705,7 @@ export async function buildLateInteractionIndex(chunks, dryRun = false, filesToR
|
|
|
705
705
|
segmentSize = null, // override SSLX-v3 segment threshold (default 10k)
|
|
706
706
|
projectRoot, // honored by LI skip policy for .sweet-search.config.json excludes
|
|
707
707
|
} = options;
|
|
708
|
-
log('\n━━━ Phase
|
|
708
|
+
log('\n━━━ Phase 3: Late Interaction Index (LateOn-Code) ━━━', 'bright');
|
|
709
709
|
|
|
710
710
|
if (dryRun) {
|
|
711
711
|
log('DRY RUN: Skipping late interaction index', 'magenta');
|
|
@@ -643,7 +643,7 @@ export async function chunkFiles(files) {
|
|
|
643
643
|
try {
|
|
644
644
|
const enriched = await enrichChunksFromGraph(allChunks, ASTChunker);
|
|
645
645
|
if (enriched > 0) {
|
|
646
|
-
log(`✓
|
|
646
|
+
log(`✓ Added scope/import context to ${enriched} code chunks`, 'green');
|
|
647
647
|
}
|
|
648
648
|
} catch (err) {
|
|
649
649
|
log(`⚠ Chunk enrichment skipped: ${err.message}`, 'yellow');
|
|
@@ -110,19 +110,22 @@ export function isVerboseMode() {
|
|
|
110
110
|
}
|
|
111
111
|
|
|
112
112
|
// ---------------------------------------------------------------------------
|
|
113
|
-
// Progress rendering —
|
|
113
|
+
// Progress rendering — a live region of animated, in-place bars.
|
|
114
114
|
//
|
|
115
|
-
// On a TTY (verbose
|
|
116
|
-
// + erase-to-EOL, with smooth 1/8-block fill.
|
|
117
|
-
//
|
|
118
|
-
//
|
|
119
|
-
//
|
|
115
|
+
// On a TTY (verbose included), each phase's bar animates in place via cursor
|
|
116
|
+
// moves + erase-to-EOL, with smooth 1/8-block fill. Multiple bars can run at
|
|
117
|
+
// once (e.g. Embedding + Late Interaction in parallel) — they share one pinned
|
|
118
|
+
// region at the bottom and update independently. While bars are live, log()
|
|
119
|
+
// prints its line above the region and redraws the bars below, so diagnostics
|
|
120
|
+
// never split a bar. The region "commits" (stays on screen) once every bar in
|
|
121
|
+
// it has reached 100%. Non-TTY (pipes / CI) falls back to throttled newlines.
|
|
120
122
|
// ---------------------------------------------------------------------------
|
|
121
123
|
const BAR_WIDTH = 30;
|
|
122
124
|
const LABEL_COL = 17; // pad "Label:" to this width so every bar's [ ] aligns
|
|
123
125
|
const SUB_BLOCKS = ['', '▏', '▎', '▍', '▌', '▋', '▊', '▉']; // eighth-block partial fills
|
|
124
126
|
const CLEAR_EOL = '\x1b[K';
|
|
125
|
-
|
|
127
|
+
const liveBars = new Map(); // label -> { current, total }; insertion order = display order
|
|
128
|
+
let regionLines = 0; // bar lines currently pinned at the bottom (TTY)
|
|
126
129
|
let lastLoggedPercent = {};
|
|
127
130
|
|
|
128
131
|
function renderBar(current, total, label) {
|
|
@@ -137,12 +140,21 @@ function renderBar(current, total, label) {
|
|
|
137
140
|
return `${colors.cyan}${head}[${bar}${empty}] ${pct}% (${current}/${total})${colors.reset}`;
|
|
138
141
|
}
|
|
139
142
|
|
|
143
|
+
function drawRegion() {
|
|
144
|
+
let out = regionLines > 0 ? `\x1b[${regionLines}A\r` : '\r';
|
|
145
|
+
for (const [label, b] of liveBars) out += renderBar(b.current, b.total, label) + CLEAR_EOL + '\n';
|
|
146
|
+
process.stdout.write(out);
|
|
147
|
+
regionLines = liveBars.size;
|
|
148
|
+
}
|
|
149
|
+
|
|
140
150
|
export function log(message, color = 'reset') {
|
|
141
151
|
if (quietMode) return;
|
|
142
152
|
const line = `${colors[color]}${message}${colors.reset}`;
|
|
143
|
-
if (
|
|
144
|
-
//
|
|
145
|
-
|
|
153
|
+
if (regionLines > 0 && process.stdout.isTTY) {
|
|
154
|
+
// Print the line above the pinned bars, then redraw the bars below it.
|
|
155
|
+
let out = `\x1b[${regionLines}A\r${line}${CLEAR_EOL}\n`;
|
|
156
|
+
for (const [label, b] of liveBars) out += renderBar(b.current, b.total, label) + CLEAR_EOL + '\n';
|
|
157
|
+
process.stdout.write(out);
|
|
146
158
|
} else {
|
|
147
159
|
console.log(line);
|
|
148
160
|
}
|
|
@@ -160,13 +172,16 @@ export function logProgress(current, total, label) {
|
|
|
160
172
|
}
|
|
161
173
|
return;
|
|
162
174
|
}
|
|
163
|
-
// Interactive TTY:
|
|
164
|
-
|
|
165
|
-
|
|
166
|
-
|
|
167
|
-
|
|
168
|
-
|
|
169
|
-
|
|
175
|
+
// Interactive TTY: update this bar in the live region and redraw.
|
|
176
|
+
liveBars.set(label, { current, total });
|
|
177
|
+
drawRegion();
|
|
178
|
+
// Once every live bar is complete, commit the region (leave it on screen).
|
|
179
|
+
let allDone = true;
|
|
180
|
+
for (const b of liveBars.values()) if (b.current < b.total) { allDone = false; break; }
|
|
181
|
+
if (allDone) {
|
|
182
|
+
for (const k of liveBars.keys()) lastLoggedPercent[k] = 0;
|
|
183
|
+
liveBars.clear();
|
|
184
|
+
regionLines = 0;
|
|
170
185
|
}
|
|
171
186
|
}
|
|
172
187
|
|
|
@@ -192,6 +192,7 @@ export function buildSessionOptions(modelId, suffix, coremlAvailable = false, ru
|
|
|
192
192
|
?? parseInt(process.env.SWEET_SEARCH_ORT_INTER_OP_THREADS || '1', 10);
|
|
193
193
|
const opts = {
|
|
194
194
|
graphOptimizationLevel: 'all',
|
|
195
|
+
logSeverityLevel: 3, // ERROR — silence ORT's expected "optimized model is machine-specific" warning
|
|
195
196
|
intraOpNumThreads: runtimeOptions.intraOpThreads ?? bestIntraOpThreads(runtimeOptions),
|
|
196
197
|
interOpNumThreads: interOpThreads,
|
|
197
198
|
executionMode,
|
|
@@ -193,6 +193,7 @@ async function loadModel() {
|
|
|
193
193
|
const { getOptimizedGraphPath } = await import('../infrastructure/onnx-session-utils.js');
|
|
194
194
|
const session = await ort.InferenceSession.create(onnxPath, {
|
|
195
195
|
executionProviders: ['cpu'],
|
|
196
|
+
logSeverityLevel: 3, // ERROR — silence ORT's expected "optimized model is machine-specific" warning
|
|
196
197
|
intraOpNumThreads: lateInteractionRuntimeConfig.intraOpThreads ?? bestIntraOpThreads(),
|
|
197
198
|
interOpNumThreads: 1,
|
|
198
199
|
optimizedModelFilePath: getOptimizedGraphPath(modelConfig.hfId, 'lateon'),
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "sweet-search",
|
|
3
|
-
"version": "2.5.
|
|
3
|
+
"version": "2.5.7",
|
|
4
4
|
"description": "Sweet Search - SOTA Hybrid Code Search Engine with WASM CatBoost Query Router, Semantic/Lexical/Structural Search, and Multilingual Support",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"main": "core/search/sweet-search.js",
|
|
@@ -163,12 +163,12 @@
|
|
|
163
163
|
},
|
|
164
164
|
"optionalDependencies": {
|
|
165
165
|
"usearch": "^2.21.4",
|
|
166
|
-
"@sweet-search/native-darwin-arm64": "2.5.
|
|
167
|
-
"@sweet-search/native-darwin-x64": "2.5.
|
|
168
|
-
"@sweet-search/native-linux-arm64-gnu": "2.5.
|
|
169
|
-
"@sweet-search/native-linux-arm64-gnu-cuda": "2.5.
|
|
170
|
-
"@sweet-search/native-linux-x64-gnu": "2.5.
|
|
171
|
-
"@sweet-search/native-linux-x64-gnu-cuda": "2.5.
|
|
166
|
+
"@sweet-search/native-darwin-arm64": "2.5.7",
|
|
167
|
+
"@sweet-search/native-darwin-x64": "2.5.7",
|
|
168
|
+
"@sweet-search/native-linux-arm64-gnu": "2.5.7",
|
|
169
|
+
"@sweet-search/native-linux-arm64-gnu-cuda": "2.5.7",
|
|
170
|
+
"@sweet-search/native-linux-x64-gnu": "2.5.7",
|
|
171
|
+
"@sweet-search/native-linux-x64-gnu-cuda": "2.5.7"
|
|
172
172
|
},
|
|
173
173
|
"engines": {
|
|
174
174
|
"node": ">=18.0.0"
|
package/scripts/init.js
CHANGED
|
@@ -252,9 +252,26 @@ export function detectProjectRoot(cwd = process.cwd()) {
|
|
|
252
252
|
export function ensureDataDir(projectRoot) {
|
|
253
253
|
const dataDir = join(projectRoot, DATA_DIR_NAME);
|
|
254
254
|
mkdirSync(dataDir, { recursive: true });
|
|
255
|
+
maybeIgnoreDataDir(projectRoot);
|
|
255
256
|
return dataDir;
|
|
256
257
|
}
|
|
257
258
|
|
|
259
|
+
// Add `.sweet-search/` to the project's .gitignore so the local index isn't
|
|
260
|
+
// committed — but ONLY if a .gitignore already exists. We never create one for
|
|
261
|
+
// a project that doesn't already use it.
|
|
262
|
+
function maybeIgnoreDataDir(projectRoot) {
|
|
263
|
+
try {
|
|
264
|
+
const gitignorePath = join(projectRoot, '.gitignore');
|
|
265
|
+
if (!existsSync(gitignorePath)) return;
|
|
266
|
+
const content = readFileSync(gitignorePath, 'utf8');
|
|
267
|
+
const already = content.split(/\r?\n/).map((l) => l.trim().replace(/^\//, '').replace(/\/$/, ''))
|
|
268
|
+
.some((l) => l === DATA_DIR_NAME);
|
|
269
|
+
if (already) return;
|
|
270
|
+
const sep = content.length === 0 || content.endsWith('\n') ? '' : '\n';
|
|
271
|
+
writeFileSync(gitignorePath, `${content}${sep}\n# Sweet Search local index\n${DATA_DIR_NAME}/\n`);
|
|
272
|
+
} catch { /* best-effort — never block init on .gitignore */ }
|
|
273
|
+
}
|
|
274
|
+
|
|
258
275
|
// ---------------------------------------------------------------------------
|
|
259
276
|
// Init config read/write
|
|
260
277
|
// ---------------------------------------------------------------------------
|
|
@@ -1576,11 +1593,10 @@ export async function runInit(args) {
|
|
|
1576
1593
|
const skippedOptIns = getSkippedOptInModels(profile);
|
|
1577
1594
|
let modelResults = new Map();
|
|
1578
1595
|
|
|
1579
|
-
//
|
|
1580
|
-
//
|
|
1581
|
-
//
|
|
1582
|
-
|
|
1583
|
-
if (skippedOptIns.length > 0) {
|
|
1596
|
+
// Opt-in models (e.g. cross-encoder rerankers, disabled by default since
|
|
1597
|
+
// commit 43a61eb) are skipped silently — they're optional features, not
|
|
1598
|
+
// missing models. Set DEBUG=1 to see which were skipped and how to enable.
|
|
1599
|
+
if (process.env.DEBUG && skippedOptIns.length > 0) {
|
|
1584
1600
|
for (const skipped of skippedOptIns) {
|
|
1585
1601
|
process.stderr.write(
|
|
1586
1602
|
`[init] Skipping opt-in model "${skipped.key}" — ` +
|
|
@@ -1,46 +1,34 @@
|
|
|
1
1
|
#!/usr/bin/env node
|
|
2
2
|
/**
|
|
3
|
-
* postinstall —
|
|
3
|
+
* postinstall — print a short "what next" message after install.
|
|
4
4
|
*
|
|
5
|
-
*
|
|
6
|
-
*
|
|
7
|
-
*
|
|
8
|
-
*
|
|
9
|
-
*
|
|
10
|
-
*
|
|
11
|
-
* SWEET_SEARCH_NO_BANNER, swallows every error, and always exits 0 so it can never
|
|
12
|
-
* fail `npm install`.
|
|
5
|
+
* Deliberately plain text: during `npm install` npm writes its own progress
|
|
6
|
+
* spinner to the terminal CONCURRENTLY with this script, which would interrupt
|
|
7
|
+
* any graphics/animation escape sequence mid-stream and leak its payload as
|
|
8
|
+
* garbage text. So the rich animated banner is reserved for `sweet-search init`
|
|
9
|
+
* and `sweet-search index` (where we own the TTY); install just prints a clean,
|
|
10
|
+
* escape-light pointer. Best-effort; always exits 0 so it can't fail an install.
|
|
13
11
|
*/
|
|
14
12
|
import process from 'node:process';
|
|
15
|
-
import tty from 'node:tty';
|
|
16
|
-
import { openSync, closeSync } from 'node:fs';
|
|
17
|
-
import { dirname, join } from 'node:path';
|
|
18
|
-
import { fileURLToPath } from 'node:url';
|
|
19
13
|
|
|
20
|
-
|
|
14
|
+
function run() {
|
|
21
15
|
const env = process.env;
|
|
22
|
-
if (env.
|
|
23
|
-
|
|
24
|
-
// Pick an output stream that is a real terminal.
|
|
25
|
-
let stream = process.stdout.isTTY ? process.stdout : null;
|
|
26
|
-
let ownedFd = -1;
|
|
27
|
-
if (!stream && process.platform !== 'win32') {
|
|
28
|
-
try {
|
|
29
|
-
ownedFd = openSync('/dev/tty', 'r+'); // throws if no controlling terminal
|
|
30
|
-
const s = new tty.WriteStream(ownedFd);
|
|
31
|
-
if (s.isTTY) stream = s;
|
|
32
|
-
} catch { /* no controlling terminal — skip */ }
|
|
33
|
-
}
|
|
34
|
-
if (!stream) return;
|
|
35
|
-
|
|
16
|
+
if (env.NO_BANNER || env.SWEET_SEARCH_NO_BANNER) return;
|
|
36
17
|
try {
|
|
37
|
-
const
|
|
38
|
-
const
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
|
|
18
|
+
const c = (n, s) => (process.stdout.isTTY ? `\x1b[${n}m${s}\x1b[0m` : s);
|
|
19
|
+
const lines = [
|
|
20
|
+
'',
|
|
21
|
+
` ${c('1;38;5;213', 'sweet-search')} installed ${c('2', '— SOTA hybrid code search')}`,
|
|
22
|
+
'',
|
|
23
|
+
` ${c('1', 'Get started:')}`,
|
|
24
|
+
` ${c('36', 'sweet-search init')} set up the current project`,
|
|
25
|
+
` ${c('36', 'sweet-search index')} build the search index`,
|
|
26
|
+
` ${c('36', 'sweet-search "query"')} search your code`,
|
|
27
|
+
` ${c('2', '(installed locally? prefix with')} ${c('2;36', 'npx')}${c('2', ', e.g. `npx sweet-search init`)')}`,
|
|
28
|
+
'',
|
|
29
|
+
];
|
|
30
|
+
process.stdout.write(lines.join('\n') + '\n');
|
|
42
31
|
} catch { /* never break an install */ }
|
|
43
|
-
finally { if (ownedFd >= 0) { try { closeSync(ownedFd); } catch { /* noop */ } } }
|
|
44
32
|
}
|
|
45
33
|
|
|
46
|
-
run()
|
|
34
|
+
run();
|